[HN Gopher] Nvidia Warp: A Python framework for high performance...
       ___________________________________________________________________
        
       Nvidia Warp: A Python framework for high performance GPU simulation
       and graphics
        
       Author : jarmitage
       Score  : 289 points
       Date   : 2024-06-14 13:28 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | eigenvalue wrote:
       | I really like how nvidia started doing more normal open source
       | and not locking stuff behind a login to their website. It makes
       | it so much easier now that you can just pip install all the cuda
       | stuff for torch and other libraries without authenticating and
       | downloading from websites and other nonsense. I guess they
       | realized that it was dramatically reducing the engagement with
       | their work. If it's open source anyway then you should make it as
       | accessible as possible.
        
         | jjmarr wrote:
         | It being on GitHub doesn't mean it's open-source.
         | 
         | https://github.com/NVIDIA/warp?tab=License-1-ov-file#readme
         | 
         | Looks more "source available" to me.
        
           | nitinreddy88 wrote:
           | That's what open-source means. Source code is open for
           | reading. It has nothing to do with Licensing. You can have
           | any type of license on top of that based on your business
           | needs
        
             | dagenix wrote:
             | That may be your definition, but that's not everyone's
             | definition. Wikipedia, for example, says:
             | 
             | > Open-source software (OSS) is computer software that is
             | released under a license in which the copyright holder
             | grants users the rights to use, study, change, and
             | distribute the software and its source code to anyone and
             | for any purpose.
             | 
             | https://en.m.wikipedia.org/wiki/Open-source_software
        
             | j-r-d wrote:
             | No. That's not how it works. It's great that they're making
             | source available but if I can't modify and distribute it,
             | it's not open.
        
             | TimeBearingDown wrote:
             | No. The Open Source Initiative maintains the definition,
             | which is accepted internationally by multiple government
             | agencies.
             | 
             | https://opensource.org/osd
             | 
             | https://opensource.org/authority
        
         | foresterre wrote:
         | I would argue that this isn't "normal open source", though it
         | is indeed not locked behind a login on their website. The
         | license (1) is feels very much proprietary, even if the source
         | code is available.
         | 
         | (1) https://github.com/NVIDIA/warp/blob/main/LICENSE.md
        
           | bionhoward wrote:
           | Agreed, especially given this
           | 
           | "2.7 You may not use the Software for the purpose of
           | developing competing products or technologies or assist a
           | third party in such activities."
           | 
           | vs
           | 
           | "California's public policy provides that every contract that
           | restrains anyone from engaging in a lawful profession, trade,
           | or business of any kind is, to that extent, void, except
           | under limited statutory exceptions."
           | 
           | https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml.
           | ..
           | 
           | (Owner/Partner who sold business, may voluntarily agree to a
           | noncompete, (which is now federally
           | https://www.ftc.gov/legal-library/browse/rules/noncompete-
           | ru... banned) is the only exception I found).
           | 
           | I'm not a lawyer. Any lawyers around? Could the 2nd provision
           | invalidate the 1st, or not?
        
             | philipov wrote:
             | You're free to engage in a lawful profession, just not
             | using _that_ Software for it.  "to that extent" is not
             | there merely for show.
        
               | bionhoward wrote:
               | Hey, that's a real argument, and it makes sense. Thank
               | you for helping to clarify this topic.
               | 
               | Question: why would NVIDIA, makers of general
               | intelligence, which seems to compete with everyone,
               | publish code for software nobody can use without breaking
               | NVIDIA rules? Wouldn't it be better for everyone if they
               | just kept that code private?
        
               | bionhoward wrote:
               | ah, just found this license here for another NVIDIA
               | product released today
               | https://developer.download.nvidia.com/licenses/nvidia-
               | open-m... this is way better
        
             | jasongill wrote:
             | FYI, the FTC noncompete rule does not go into effect until
             | September, and it specifically carves out an exception to
             | the rule for existing noncompetes for senior executives
        
         | dagmx wrote:
         | This isn't open source. It s the equivalent of headers being
         | available to a dylib, just that they happen to be a python API.
         | 
         | Most of the magic is behind closed source components, and it's
         | posted with a fairly restrictive license.
        
           | fragmede wrote:
           | And people say nvida doesn't have a moat.
        
           | boywitharupee wrote:
           | In a similar fashion, you'll see that JAX has frontend code
           | being open-sourced, while device-related code is distributed
           | as binaries. For example, if you're on Google's TPU, you'll
           | see _libtpu.so_ , and on macOS, you'll see
           | _pjrt_plugin_metal_1.x.dylib_.
           | 
           | The main optimizations (scheduler, vectorizer, etc.) are
           | hidden behind these shared libraries. If open-sourced, they
           | might reveal hints about proprietary algorithms and provide
           | clues to various hardware components, which could potentially
           | be exploited.
        
         | water-your-self wrote:
         | Accessible, as long as you purchase their very contested
         | hardware.
        
       | jorlow wrote:
       | Does this compete at all with openAI's triton (which is sort of a
       | higher level cuda without the vendor lock in)?
        
       | arvinsim wrote:
       | As someone who is not in the simulation and graphic space, what
       | does this library bring that current libraries do not?
        
         | ok123456 wrote:
         | It overlaps a lot with the library Taichi, which Disney
         | supports.
         | 
         | It's noteworthy that Taichi also supports AMD, MPI, and Kokkos.
        
       | paulluuk wrote:
       | While this is really cool, I have to say..
       | 
       | > import warp as wp
       | 
       | Can we please not copy this convention over from numpy? In the
       | example script, you use 17 characters to write this just to save
       | 18 characters later on in the script. Just import the warp
       | commands you use, or if you really want "import warp", but don't
       | rename imported libraries, please.
        
         | dahfizz wrote:
         | Strongly agreed! This convention has even infected internal
         | tooling at my company. Scripts end up with tons of cryptic
         | three letter names. It saves a couple keystrokes but wastes
         | engineering time to maintain
        
           | physicsguy wrote:
           | The convention is a convention because the libraries are used
           | so commonly. If you give anyone in scientific computing
           | Python world something with "np" or "pd" then they know what
           | that is. Doing something other than what is convention for
           | those libraries wastes more time when people jump into a file
           | because people have to work out now whether "array" is some
           | bespoke type or the NumPy one they're used to.
        
             | paulluuk wrote:
             | There is no way that "warp" is already such a household
             | name that it's common enough to shorten it to "wp".
             | Likewise, the libraries at OP's company are for sure not
             | going to be common to anyone starting out at the company,
             | and might still be confusing to anyone who has worked there
             | for years but just hasn't had to use that specific library.
             | 
             | Pandas and Numpy are popular, sure. As is Tensorflow (often
             | shortened to tf). But where do you draw the line, then?
             | should the openai library be imported as oa? should flask
             | be imported as fk? should requests be imported as rq?
             | 
             | It seems to happen mostly to libraries that are commonly
             | used by one specific audience: scientists who are forced to
             | use a programming language, and who think that 1-letter
             | variables are good variable names, and who prefer using
             | notebooks over scripts with functions.
             | 
             | Don't get me wrong, I'm glad that Python gets so much
             | attention from the scientific community, but I feel that
             | small little annoyances like this creep in because of it,
             | too.
        
           | hot_gril wrote:
           | It doesn't really matter
        
         | dr_kiszonka wrote:
         | Interesting. That is a good point. However, if I saw someone
         | writing numpy.array() or pandas.read_csv(), my first reaction
         | would be to think they were a beginner.
        
         | hoosieree wrote:
         | The more Python I write, the more I feel that "from foo import
         | name1,name2,nameN" is The Way. Yes it's more verbose. Yes it
         | loses any benefits of namespaces. However it encourages you to
         | focus on your actual problem rather than hypothetical problems
         | you might have someday, and the namespace clashes might have a
         | positive unintended consequence of making you realize you don't
         | actually need that other library after all.
        
           | water-your-self wrote:
           | > the namespace clashes might have a positive unintended
           | consequence of making you realize you don't actually need
           | that other library after all.
        
         | Y_Y wrote:
         | import warp as np
         | 
         | Now you can re-use your old code as-is!
        
         | 2cynykyl wrote:
         | This math is not adding up for me...isn't import warp
         | necessary? So you only 6 more characters to write as wp. And
         | anyway, to me savings in cognitive load later when I'm in the
         | flow of coding is worth it.
        
       | w-m wrote:
       | I was playing around with taichi a little bit for a project.
       | Taichi lives in a similar space, but has more than an NVIDIA
       | backend. But its development has stalled, so I'm considering
       | switching to warp now.
       | 
       | It's quite frustrating that there's seemingly no long-lived
       | framework that allows me to write simple numba-like kernels and
       | try them out in NVIDIA GPUs and Apple GPUs. Even with taichi, the
       | Metal backend was definitely B-tier or lower: Not offering 64 bit
       | ints, and randomly crashing/not compiling stuff.
       | 
       | Here's hoping that we'll solve the GPU programming space in the
       | next couple years, but after ~15 years or so of waiting, I'm no
       | longer holding my breath.
       | 
       | https://github.com/taichi-dev/taichi
        
         | panagathon wrote:
         | This is the library I've always wanted. Look at that Julia set.
         | Gorgeous. Thanks for this. I'm sorry to hear about the dev
         | issues. I wish I could help.
        
         | paulmd wrote:
         | the problem with the GPGPU space is that everything except CUDA
         | is so _fractally broken_ that everything eventually converges
         | to the NVIDIA stuff that actually works.
         | 
         | yes, the heterogeneous compute frameworks are largely broken,
         | except for OneAPI, which does work, but only on CUDA. SPIR-V,
         | works best on CUDA. OpenCL: works best on CUDA.
         | 
         | Even once you get past the topline "does it even attempt to
         | support that", you'll find that AMD's runtimes are broken too.
         | Their OpenCL runtime is buggy and has a bunch of paper features
         | which don't work, and a bunch of AMD-specific behavior and bugs
         | that aren't spec-compliant. So basically you have to have an
         | AMD-specific codepath anyway to handle the bugs. Same for
         | SPIR-V: the biggest thing they have working against them is
         | that AMD's Vulkan Compute support is incomplete and buggy too.
         | 
         | https://devtalk.blender.org/t/was-gpu-support-just-outright-...
         | 
         | https://render.otoy.com/forum/viewtopic.php?f=7&t=75411 ("As of
         | right now, the Vulkan drivers on AMD and Intel are not mature
         | enough to compile (much less ship) Octane for Vulkan")
         | 
         | If you are going to all that effort anyway, why are you (a)
         | targeting AMD at all, and (b) why don't you just use CUDA in
         | the first place? So everyone writes more CUDA and nothing gets
         | done. Cue some new whippersnapper who thinks they're gonna cure
         | all AMD's software problems in a month, they bash into the
         | brick wall, write blog post, becomes angry forums commenter,
         | rinse and repeat.
         | 
         | And now you have another abandoned cross-platform project that
         | basically only ever supported NVIDIA anyway.
         | 
         | Intel, bless their heart, is actually trying and their stuff
         | largely does just work, supposedly, although I'm trying to get
         | their linux runtime up and running on a Serpent Canyon NUC with
         | A770m and am having a hell of a time. But supposedly it does
         | work especially on windows (and I may just have to knuckle
         | under and use windows, or put a pcie card in a server pc). But
         | they just don't have the marketshare to make it stick.
         | 
         | AMD is stuck in this perpetual cycle of expecting _anyone_ else
         | but themselves to write the software, and then not even
         | providing enough infrastructure to get people to the starting
         | line, and then surprised-pikachu nothing works, and surprise-
         | pikachu they never get any adoption. Why has nvidia done
         | this!?!?  /s
         | 
         | The other big exception is Metal, which both works and has an
         | actual userbase. The reason they have Metal support for cycles
         | and octane is because _they contribute the code_ , that's
         | really what needs to happen (and I think what Intel is doing -
         | there's just a lot of work to come from zero). But of course
         | Metal is apple-only, so really ideally you would have a layer
         | that goes over the top...
        
         | szvsw wrote:
         | I've been in love with Taichi for about a year now. Where's the
         | news source on development being stalled? It seemed like things
         | were moving along at pace last summer and fall at least if I
         | recall correctly.
        
           | w-m wrote:
           | https://github.com/taichi-dev/taichi/discussions/8506
        
             | szvsw wrote:
             | Ha, interesting timing, last post 6Hr ago. Sounds like they
             | are dogfooding it at least which is good. And I would agree
             | with the assessment that 1.x is fairly feature complete, at
             | least from my experience using it (scientific computing).
             | And good to hear that they are planning on pushing support
             | patches for eg python 3.12 cuda 12 etc
        
           | contravariant wrote:
           | There's only been 7 commits to master in the last 6 month,
           | half of those purely changes to test or documentation, so it
           | kind of sounds like you're both right.
        
         | talldayo wrote:
         | > Here's hoping that we'll solve the GPU programming space in
         | the next couple years, but after ~15 years or so of waiting,
         | I'm no longer holding my breath.
         | 
         | It feels like the ball is entirely in Apple's court. Well-
         | designed and Open Source GPGPU libraries exist, even ones that
         | Apple has supported in the past. Nvidia supports many of them,
         | either through CUDA or as a native driver.
        
       | dudus wrote:
       | Gotta keep digging that CUDA moat as hard and as fast as
       | possible.
        
         | astromaniak wrote:
         | Exactly. and that's why it's valued at $3T++, about 10x of AMD
         | and Intel put together.
        
       | VyseofArcadia wrote:
       | Aren't warps already architectural elements of nvidia graphics
       | cards? This name collision is going to muddy search results.
        
         | logicchains wrote:
         | >Aren't warps already architectural elements of nvidia graphics
         | cards?
         | 
         | Architectural elements of _all_ graphics cards.
        
           | VyseofArcadia wrote:
           | Unsure of how authoritative this is, but this article[0]
           | seems to imply it's a matter of branding.
           | 
           | > The efficiency of executing threads in groups, which is
           | known as warps in NVIDIA and wavefronts in AMD, is crucial
           | for maximizing core utilization.
           | 
           | [0] https://www.xda-developers.com/how-does-a-graphics-card-
           | actu...
        
             | logicchains wrote:
             | ROCm also refers to them as warps https://rocm.docs.amd.com
             | /projects/HIP/en/latest/understand/... :
             | 
             | >The threads are executed in groupings called warps. The
             | amount of threads making up a warp is architecture
             | dependent. On AMD GPUs the warp size is commonly 64
             | threads, except in RDNA architectures which can utilize a
             | warp size of 32 or 64 respectively. The warp size of
             | supported AMD GPUs is listed in the Accelerator and GPU
             | hardware specifications. NVIDIA GPUs have a warp size of
             | 32.
        
       | nurettin wrote:
       | How is this different than taichi? Even the decorators look
       | similar.
        
       | raytopia wrote:
       | I love how many python to native/gpu code projects there are now.
       | It's nice to see a lot of competition in the space. An
       | alternative to this one could be Taichi Lang [0] it can use your
       | gpu through Vulkan so you don't have to own Nvidia hardware.
       | Numba [1] is another alternative that's very popular. I'm still
       | waiting on a Python project that compiles to pure C (unlike
       | Cython [2] which is hard to port) so you can write homebrew games
       | or other embedded applications.
       | 
       | [0] https://www.taichi-lang.org/
       | 
       | [1] http://numba.pydata.org/
       | 
       | [2] https://cython.readthedocs.io/en/stable/
        
         | setopt wrote:
         | CuPy is also great - makes it trivial to port existing
         | numerical code from NumPy/SciPy to CUDA, or to write code than
         | can run either on CPU or on GPU.
         | 
         | I recently saw a 2-3 orders of magnitude speed-up of some
         | physics code when I got a mid-range nVidia card and replaced a
         | few NumPy and SciPy calls with CuPy.
        
           | 6gvONxR4sf7o wrote:
           | Don't forget JAX! It's my preferred library for "i want to
           | write numpy but want it to run on gpu/tpu with auto diff etc"
        
             | westurner wrote:
             | From https://news.ycombinator.com/item?id=37686351 :
             | 
             | >> _sympy.utilities.lambdify.lambdify()https://github.com/s
             | ympy/sympy/blob/a76b02fcd3a8b7f79b3a88df... :_
             | 
             | >> _" ""Convert a SymPy expression into a function that
             | allows for fast numeric evaluation""" [e.g. the CPython
             | math module, mpmath, NumPy, SciPy,_ CuPy, JAX, TensorFlow,
             | _SymPy, numexpr,]_
             | 
             | sympy#20516: "re-implementation of torch-lambdify"
             | https://github.com/sympy/sympy/pull/20516
        
         | skrhee wrote:
         | I would like to warn people away from taichi if possible. At
         | least back in 1.7.0 there were some bugs in the code that made
         | it very difficult to work with.
        
           | hoosieree wrote:
           | Do you have any more specifics about these limitations? I'm
           | considering trying Taichi for a project because it seems to
           | be GPU vendor agnostic (unlike CuPy).
        
             | sinuhe69 wrote:
             | I only dabbled in Taichi, but I find its magic has
             | limitation. I took a provided example, just increased the
             | length of the loop and bam! it crashed the Windows driver.
             | Obviously it ran out of memory but I have no idea how how
             | to adjust except experiment with different values. If it
             | has information about the GPU and its memory, I thought it
             | could automatically adjust the block size but apparently
             | not. There is a config command to fine tune the for loop
             | parallelizing but the docs says we normally do not need to
             | use them.
        
         | szvsw wrote:
         | I'm a huge Taichi stan. So much easier and more elegant than
         | numba. The support for data classes and data_oriented classes
         | is excellent. Being able to define your own memory layouts is
         | extremely cool. Great documentation. Really really recommend!
        
         | Joky wrote:
         | > I'm still waiting on a Python project that compiles to pure C
         | 
         | In case you haven't tried it yet, Pythran is an interesting one
         | to play with: https://pythran.readthedocs.io
         | 
         | Also, not compiling to C but to native code still would be
         | Mojo: https://www.modular.com/max/mojo
        
           | holoduke wrote:
           | Does it really matters in performance. I see python in these
           | kind of setups as orchestrators of computing apis/engines.
           | For example from python you instruct to compute following
           | list etc. No hard computing in python. Performance not so
           | much of an issue.
        
             | crabbone wrote:
             | Marshaling is an issue as well as concurrency.
             | 
             | Simply copying a chunk of data between two libraries
             | through Python is already painful. There are so-called
             | "buffer API" in Python, but it's very rare that Python
             | users can actually take advantage of this feature. If
             | anything in Python as much as looks at the data, that's not
             | going to work etc.
             | 
             | Similarly, concurrency. A lot of native libraries for
             | Python are written with the expectation that nothing in
             | Python really runs concurrently. And then you are presented
             | with two bad options: try running in different threads (so
             | that you don't have to copy data), but things will probably
             | break because of races, or run in different processes, and
             | spend most of the time copying data between them. Your
             | interface to stuff like MPI is, again, only at the native
             | level, or you will copy so much that the benefits of
             | distributed computation might not outweigh the downsides of
             | copying.
        
           | ok123456 wrote:
           | nuitka already does this
        
         | pjmlp wrote:
         | I would rather that Python catches up with Common Lisp tooling
         | in JIT/AOT in the box, instead of compilation via C.
        
           | heavyset_go wrote:
           | I'd kill for AOT compiled Python. 3.13 ships with a basic JIT
           | compiler.
        
             | pjmlp wrote:
             | In 3.13 you need to compile Python yourself if you want to
             | test the preview JIT.
        
           | crabbone wrote:
           | Just be honest with yourself. You want Java. Or some other
           | JVM language.
           | 
           | Python has no real JIT, Python needs no real JIT.
           | 
           | Python is a gimmick language whose goodness was in showing
           | Java programmers that a master can write a short and concise
           | program that expresses just as much as ten times as much of
           | XML configuration files with a grotesque layering of code
           | abstractions would.
           | 
           | Masters have left Python around 15 years ago. Today it's just
           | a graffiti on a wall of an abandoned house squatted by people
           | who have no idea who made the graffiti and what it was about.
           | It's a joke that stopped being funny so long ago nobody can
           | even remember why it was funny in the first place.
           | 
           | Adding JIT and other nonsense that was added to Python in the
           | last decade is just as stupid as trying to build a spaceship
           | based on a tricycle by incrementally adding missing features.
           | It would've been much easier to just build a spaceship than
           | to modify a tricycle to act like one, especially since you
           | aren't allowed to throw the tricycle away.
        
             | nequo wrote:
             | Where have the masters gone from Python?
        
               | crabbone wrote:
               | Into management? :)
               | 
               | There is no career path for programmers. Once you are a
               | programmer, that's the end of your career. You climb the
               | ladder by starting to manage people. But you don't become
               | a super-programmer.
        
               | trallnag wrote:
               | What about 10x programmers
        
             | pjmlp wrote:
             | Unfortunately there are folks that believe we have to use
             | Python, and then rewrite in C, while calling those wrappers
             | "Python" libraries, regardless of how much I would like to
             | use JVM or CLR instead.
             | 
             | So if Python has taken Lisp's place in AI, I would rather
             | that it also takes the JIT and AOT tooling as well.
        
               | crabbone wrote:
               | Well, then, I guess, I should've prefaced what I wrote
               | with "ideally".
               | 
               | Of course, there are circumstances we, the flesh-and-
               | blood people cannot change, and if Python is all there
               | is, no matter how good or bad it is, then it's probably
               | better to make it do something we need... One could still
               | hope though.
        
             | lwroQA wrote:
             | Yes, there are still a couple of older people around who
             | used DEI-grifting to take control of the Python org. They
             | of course kept the key positions instead of giving them to
             | underprivileged people.
             | 
             | So, a lot of interesting people have left. The scientific
             | ecosystem does its own thing and keeps Python somewhat
             | alive. Not sure why one must write CUDA kernels in Python.
             | Why don't they write a Lisp -> PTX compiler so that one can
             | directly compile formally verified kernels?
        
         | tony69 wrote:
         | https://nuitka.net/ ?
        
       | owenpalmer wrote:
       | > Warp is designed for spatial computing
       | 
       | What does this mean? I've mainly heard the term "spatial
       | computing" in the context of the Vision Pro release. It doesn't
       | seem like this was intended for AR/VR
        
         | educasean wrote:
         | As someone not in this space, I was immediately tripped up by
         | this as well. Does spatial computing mean something else in
         | this context?
        
           | basiccalendar74 wrote:
           | main use case seems to be simulations in 2D, 3D or nD spaces.
           | spaces -> spatial.
        
       | water-your-self wrote:
       | >GPU support requires a CUDA-capable NVIDIA GPU and driver
       | (minimum GeForce GTX 9xx).
       | 
       | Very tactful from nvidia. I have a lovely AMD gpu and this
       | library is worthless for it.
        
         | coldtea wrote:
         | Err, it is nvidia. Why would they support AMD?
        
       | jarmitage wrote:
       | > What's Taichi's take on NVIDIA's Warp?
       | 
       | > Overall the biggest distinction as of now is that Taichi
       | operates at a slightly higher level. E.g. implict loop
       | parallelization, high level spatial data structures, direct
       | interops with torch, etc.
       | 
       | > We are trying to implement support for lower level programming
       | styles to accommodate such things as native intrinsics, but we do
       | think of those as more advanced optimization techniques, and at
       | the same time we strive for easier entry and usage for beginners
       | or people not so used to CUDA's programming model
       | 
       | - https://github.com/taichi-dev/taichi/discussions/8184
        
       | BenoitP wrote:
       | This should be seen in light of the Great Differentiable
       | Convergence(tm):
       | 
       | NERFs backpropagating pixels colors into the volume, but also
       | semantic information from the image label, embedded from an LLM
       | reading a multimedia document.
       | 
       | Or something like this. Anyway, wanna buy an NVIDIA GPU ;)?
        
       | wallscratch wrote:
       | Can anyone comment on how efficient the Warp code is compared to
       | manually written / fine-tuned CUDA?
        
       | jokoon wrote:
       | funny that now some softwares are hardware dependent
       | 
       | OpenCL seems like it's just obsolete
        
       ___________________________________________________________________
       (page generated 2024-06-14 23:00 UTC)