[HN Gopher] oneAPI - A cross-industry, open, standards-based uni...
       ___________________________________________________________________
        
       oneAPI - A cross-industry, open, standards-based unified
       programming model
        
       Author : wbryce
       Score  : 66 points
       Date   : 2021-03-01 14:42 UTC (8 hours ago)
        
 (HTM) web link (www.oneapi.com)
 (TXT) w3m dump (www.oneapi.com)
        
       | stonogo wrote:
       | I can't help but assume this idea seemed much more compelling
       | when Intel was still selling Omnipath.
        
       | jdc wrote:
       | There's also OCCA which does JIT compilation to C++, OpenMP,
       | CUDA, HIP, OpenCL and Metal. Originally built at Virginia Tech
       | and now maintained by the Center for Efficient Exascale
       | Discretizations of the DOE.
       | 
       | And it seems to have decent Python bindings.
       | 
       | https://libocca.org
        
       | brianberns wrote:
       | Classic relevant xkcd: https://xkcd.com/927/
        
         | harles wrote:
         | Exactly what I was thinking of. I'm having a lot of trouble
         | understanding the value proposition to vendors here.
        
           | BadInformatics wrote:
           | It's better to think of this as a more friendly (i.e. open
           | source development model) first party compute stack than some
           | kind of pan-vendor standard. For example, anyone using MKL is
           | now nominally using oneAPI libraries. They also went to the
           | trouble of implementing/pushing existing standards instead of
           | baking their own thing: SPIR-V for the IR format, SYCL for
           | the high level programming interface, an OpenCL 3
           | implementation (AIUI they have the most complete
           | implementation of 2.x and 3), etc.
        
             | dyingkneepad wrote:
             | So "Open Source development model" (developing in the open)
             | is not the same thing as "Open Source Code" (developing in
             | closed doors, then throwing code to the other side of the
             | wall once in a while). Intel has a history of doing both
             | depending on the project and internal groups involved,
             | where only really the Open Development projects are
             | actually successful and long-term. OneAPI uses a lot of
             | components and I am not entirely sure they all follow the
             | Open Development model. There's a lot of Open Source stuff
             | out there that you basically can't contribute to: your
             | contributions are ignored because source code is open but
             | development isn't. Does anybody here know about how this is
             | done for the OneAPI-related projects?
        
           | someguydave wrote:
           | The value is that Intel wants to compete with nvidia
        
       | rizzir wrote:
       | By reading the name alone I can't help but think of the xkcd
       | related to standards.
        
       | [deleted]
        
       | taylorlapeyre wrote:
       | Situation: there are now n+1 competing standards.
       | 
       | https://xkcd.com/927/
        
       | mhd wrote:
       | My Rational Unified Process PTSD is triggered.
        
       | touisteur wrote:
       | I'm having a hard time trusting Intel with anything AI mid to
       | long term. OpenVINO was great until they removed support for
       | Intel FPGAs, and they've been closing and deprecating low-level
       | APIs that were necessary for low-latency, low-watt work on ex-
       | Movidius hardware (and KeemBay always seems to be 'next
       | semester'...). We already have huge lock-in with Nvidia, but at
       | least we get wide expertise and huge perf boost, what does oneAPI
       | bringing expect 'portability' and 'runs as fast as
       | tf/torch/TVM'... OpenCL? Rocm? Vulkan targets? Who's going to
       | debug and support that?
       | 
       | I'm struggling to see what's enticing in this for python people
       | that already have largely optimised torch and tf cpu and gpu
       | backends, especially for batch work. And for latency-sensitive
       | inference, I thought the 'industry' was going for TVM or other
       | 'target all the things at code level'.
       | 
       | I'm thinking: gimme your network in onnx format, everyone gives a
       | C inference/compilation API, and let everyone optimize /behind/
       | that... Xilinx, AMD, whoever-Google-bought-last-week...
        
       | dljsjr wrote:
       | Something similar that has been around for ages and isn't
       | controlled by Intel is MAGMA:
       | https://bitbucket.org/icl/magma/src/master/
        
         | oivey wrote:
         | Isn't MAGMA more like a canned set of linear algebra routines
         | rather than a general interface for executing arbitrary code on
         | GPUs? That's pretty different.
        
         | jdc wrote:
         | _MAGMA Users ' Guide
         | 
         | Univ. of Tennessee, Knoxville
         | 
         | Univ. of California, Berkeley
         | 
         | Univ. of Colorado, Denver
         | 
         | Date                   October 2020
         | 
         | The goal of the MAGMA project is to create a new generation of
         | linear algebra libraries that achieves the fastest possible
         | time to an accurate solution on heterogeneous architectures,
         | starting with current multicore + multi-GPU systems. To address
         | the complex challenges stemming from these systems'
         | heterogeneity, massive parallelism, and the gap between compute
         | speed and CPU-GPU communication speed, MAGMA's research is
         | based on the idea that optimal software solutions will
         | themselves have to hybridize, combining the strengths of
         | different algorithms within a single framework. Building on
         | this idea, the goal is to design linear algebra algorithms and
         | frameworks for hybrid multicore and multi-GPU systems that can
         | enable applications to fully exploit the power that each of the
         | hybrid components offers.
         | 
         | Designed to be similar to LAPACK in functionality, data
         | storage, and interface, the MAGMA library allows scientists to
         | easily port their existing software components from LAPACK to
         | MAGMA, to take advantage of the new hybrid architectures. MAGMA
         | users do not have to know CUDA in order to use the library.
         | 
         | There are two types of LAPACK-style interfaces. The first one,
         | referred to as the CPU interface, takes the input and produces
         | the result in the CPU's memory. The second, referred to as the
         | GPU interface, takes the input and produces the result in the
         | GPU's memory. In both cases, a hybrid CPU/GPU algorithm is
         | used. Also included is MAGMA BLAS, a complementary to CUBLAS
         | routines._
         | 
         | http://icl.utk.edu/projectsfiles/magma/doxygen/index.html
        
       | RcouF1uZ4gsC wrote:
       | Seems like yet another CUDA alternative. Thing is, even if you
       | were wanting to move away from CUDA, the churn of the alternative
       | models would be scary. Do you want to develop for CUDA which has
       | been stable and developed for quite some time, or do you want to
       | try a new standard that you don't known if it will be obsolete in
       | a couple of years?
        
         | gbrown_ wrote:
         | > Seems like yet another CUDA alternative.
         | 
         | How many real alternatives are there though aside from OpenCL?
         | Not to mention if you're interested in targeting something that
         | isn't an Nvidia GPU.
        
         | BadInformatics wrote:
         | > do you want to try a new standard that you don't known if it
         | will be obsolete in a couple of years
         | 
         | Some incarnation of oneAPI is bound to exist as long as Intel
         | has a foothold in the HPC market. For example, MKL and MKL-DNN
         | have been rebranded with a oneAPI prefix. So no, the long term
         | stability argument doesn't hold water.
         | 
         | I'd also note that they appear to have added support for cuBLAS
         | and cuDNN as backends in their respective oneAPI libraries. It
         | would be hilarious if that lead to more people running oneAPI
         | on non-Intel hardware than first party stuff.
        
           | jcelerier wrote:
           | otoh Intel has over the years developed a few parallel
           | libraries / extensions that have then been deprecated, for
           | instance Cilk Plus...
        
             | volta83 wrote:
             | and ISPC, and... and...
        
               | celrod wrote:
               | Has ispc been deprecated? Last GitHub commit was three
               | days ago.
        
               | moonbug wrote:
               | it's a pet project.
        
       | hctaw wrote:
       | The problem of a heterogeneity in GPGPUs is that AMD and Intel
       | can't make products that are competitive with NVidia. The
       | "programming model" for "cross industries" (whatever those mean,
       | the whole naming of this project is weird) aren't particularly
       | deep moats if there were competitive solutions at lower prices.
        
         | shmerl wrote:
         | AMD GPUs are better than Nvidia for that - they had async
         | compute way longer. I thought the problem was CUDA lock-in and
         | lack of a nice programming model using something modern like
         | Rust may be.
        
           | volta83 wrote:
           | > AMD GPUs are better than Nvidia for that
           | 
           | Better in what sense?
           | 
           | > they had async compute way longer
           | 
           | What is that?
        
             | shmerl wrote:
             | Something like this: http://developer.amd.com/wordpress/med
             | ia/2012/10/Asynchronou...
        
           | Jhsto wrote:
           | Then again CUDA is pretty accessible language to program SIMD
           | machines. I am bit sceptic if most people would recognise
           | subjectively well-done SIMD language (lets say from
           | correctness and performance standpoint). The programming
           | model for SIMD is inherently parallel with exceptions for
           | sequential code, which is pretty much the exact opposite of
           | the programming you do on the CPU side. I'd somehow imagine
           | most people would not find that modern at all, or
           | alternatively, it would not be the language with "product
           | market fit" since it's less likely to catch on. AMD or Intel
           | may have a "better" language in some sense, but it seems most
           | people prefer familiarity with what they already know.
        
             | marmaduke wrote:
             | > I am bit sceptic if most people would recognise
             | subjectively well-done SIMD language (lets say from
             | correctness and performance standpoint)
             | 
             | I found ISPC (https://ispc.github.io) to be a little easier
             | to work with than CUDA, for SIMD on CPUs, as it sits a bit
             | in the middle of the CUDA and regular C models. Interop can
             | work via C ABI i.e. trivial, it generates regular object
             | files, and so on.
        
             | shmerl wrote:
             | CUDA is still stuck with Nvidia only. So improving on that
             | and on the language itself is clearly an open item.
        
               | Jhsto wrote:
               | Well, oneAPI can be used to do that. This submission is
               | most likely in reference of a post a few days ago [1].
               | Here, PTX which CUDA targets is retargeted into SPIR-V
               | which is ran atop oneAPI [2].
               | 
               | [1]: https://news.ycombinator.com/item?id=26262038
               | 
               | [2]: https://spec.oneapi.com/level-zero/latest/index.html
        
               | moonbug wrote:
               | but almost no one codes to the CUDA driver api.
        
           | nynx wrote:
           | rust-gpu (https://shader.rs) is a spirv backend for Rust.
           | It's early days, but already really promising. In my opinion,
           | applying a good package manager (cargo) and a good language
           | (rust) to gpgpu and rendering could result in great things.
        
           | oivey wrote:
           | By async compute, do you mean something different than what
           | CUDA streams expose?
           | 
           | AMD's compute cards have generally been worse than Nvidia's
           | as far as I know.
        
             | shmerl wrote:
             | I mean actual hardware capabilities. AMD GPUs were ahead in
             | that.
             | 
             | See: http://developer.amd.com/wordpress/media/2012/10/Async
             | hronou...
        
               | oivey wrote:
               | This only seems to applicable to shaders? GPGPU in most
               | contexts is referring to code divorced from any graphics
               | pipeline, like CUDA or ROCm. CUDA has had asynchronous
               | and concurrent kernel calls for a long time. How are
               | asynchronous compute shaders relevant in that context?
        
               | monocasa wrote:
               | Compute shaders are divorced from the graphics pipeline
               | as well. As far as the hardware is concerned, CUDA/ROCm
               | and compute shaders are the same thing.
        
               | Jasper_ wrote:
               | Compute shaders and CUDA have very different execution
               | and driver models. Just take a look at the way memory is
               | managed in each.
        
               | my123 wrote:
               | Were. AMD totally squandered their advantage that they
               | had with launching GCN early through an absolutely awful
               | software stack.
        
               | shmerl wrote:
               | RDNA 2 only improved on the above. Nvidia were and remain
               | a dead end with CUDA lock-in, but I agree that there
               | should be more combined efforts to break it.
        
               | my123 wrote:
               | Lol? ROCm is a mess.
               | 
               | It's awful. And it is _not_ even supported on RDNA and
               | RDNA2. They even dropped Polaris support last month so
               | that it 's only supported on Vega.
               | 
               | And that's very much not a high quality GPU compute
               | stack, sorry...
               | 
               | CUDA isn't a dead end and isn't going away any time soon.
        
               | shmerl wrote:
               | I was talking about hardware, not about ROCm. I said
               | above, GPU compute could benefit from a nice programming
               | model with a modern language. That will dislodge CUDA
               | lock-in for good.
               | 
               | CUDA isn't dead, but there should be a bigger effort to
               | get rid of it because it's not a proper GPU programming
               | tool but rather a market manipulation tool by Nvidia.
        
               | my123 wrote:
               | Nowadays, modern Nvidia hardware (Volta onwards) handles
               | compute workloads better, notably through independent
               | thread scheduling.
               | 
               | RDNA2 with its cache and low main memory bandwidth didn't
               | help either, because while the cache is there, you're
               | going to exceed what it can absorb by a lot in compute
               | workloads...
        
       | RocketSyntax wrote:
       | Is "oneAPI Deep Neural Network Library (oneDNN)" an alternative
       | to ONNX?
        
       | DethNinja wrote:
       | Why would someone use this instead of plain old OpenCL(or CUDA)
       | with C++?
       | 
       | What is the value proposal here?
        
         | jpsamaroo wrote:
         | OpenCL and various other solutions basically require that one
         | writes kernels in C/C++. This is an unfortunate limitation, and
         | can make it hard for less experienced users (researchers
         | especially) to write correct and performant GPU code, since
         | neither language lends itself to writing many mathematical and
         | scientific models in a clean, maintainable manner (in my
         | opinion).
         | 
         | What oneAPI (the runtime), and also AMD's ROCm (specifically
         | the ROCR runtime), do that is new is that they enable packages
         | like oneAPI.jl [1] and AMDGPU.jl [2] to exist (both Julia
         | packages), without having to go through OpenCL or C++
         | transpilation (which we've tried out before, and it's quite
         | painful). This is a great thing, because now users of an
         | entirely different language can still utilize their GPUs
         | effectively and with near-optimal performance (optimal w.r.t
         | what the device can reasonably attain).
         | 
         | [1] https://github.com/JuliaGPU/oneAPI.jl [2]
         | https://github.com/JuliaGPU/AMDGPU.jl
        
           | my123 wrote:
           | Which is something that CUDA provided since the very
           | beginning. (with PTX)
        
             | pjmlp wrote:
             | No one will take over CUDA's dominance until they realize
             | that one reason why most researchers flocked into it were
             | its polyglot capabilities, and graphical debuggers.
        
               | moonbug wrote:
               | in other words, Nvidia's product execution was spot-on.
        
             | jpsamaroo wrote:
             | Yes, and that's why Julia gained CUDA support first. My
             | point was to respond to "Why would someone use this instead
             | of plain old OpenCL(or CUDA) with C++?", and my answer was,
             | "you can use something other than OpenCL C or C++". I'm not
             | trying to say that CUDA is any lesser of a platform because
             | of this; instead, other vendor's GPUs are now becoming
             | easier to use and program.
        
       | gbrown_ wrote:
       | Both Intel's oneAPI and AMDs ROCm/HIP stacks success are
       | obviously tied to (and part of) the success of each companies
       | future products. And today I'd bet AMD will come off better. They
       | have multiple contracts for HPC systems with their future GPU
       | products as well as an existing presence in consumer gaming
       | products.
       | 
       | Intel's effort seem to be banking on the Aroura system at ANL.
       | Beyond that I don't know who's lining up to by Xe GPUs? Though I
       | guess oneAPI will fit into whatever they do further down the
       | line.
        
       ___________________________________________________________________
       (page generated 2021-03-01 23:01 UTC)