[HN Gopher] Nvidia Warp: A Python framework for high performance...
___________________________________________________________________
Nvidia Warp: A Python framework for high performance GPU simulation
and graphics
Author : jarmitage
Score : 289 points
Date : 2024-06-14 13:28 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| eigenvalue wrote:
| I really like how nvidia started doing more normal open source
| and not locking stuff behind a login to their website. It makes
| it so much easier now that you can just pip install all the cuda
| stuff for torch and other libraries without authenticating and
| downloading from websites and other nonsense. I guess they
| realized that it was dramatically reducing the engagement with
| their work. If it's open source anyway then you should make it as
| accessible as possible.
| jjmarr wrote:
| It being on GitHub doesn't mean it's open-source.
|
| https://github.com/NVIDIA/warp?tab=License-1-ov-file#readme
|
| Looks more "source available" to me.
| nitinreddy88 wrote:
| That's what open-source means. Source code is open for
| reading. It has nothing to do with Licensing. You can have
| any type of license on top of that based on your business
| needs
| dagenix wrote:
| That may be your definition, but that's not everyone's
| definition. Wikipedia, for example, says:
|
| > Open-source software (OSS) is computer software that is
| released under a license in which the copyright holder
| grants users the rights to use, study, change, and
| distribute the software and its source code to anyone and
| for any purpose.
|
| https://en.m.wikipedia.org/wiki/Open-source_software
| j-r-d wrote:
| No. That's not how it works. It's great that they're making
| source available but if I can't modify and distribute it,
| it's not open.
| TimeBearingDown wrote:
| No. The Open Source Initiative maintains the definition,
| which is accepted internationally by multiple government
| agencies.
|
| https://opensource.org/osd
|
| https://opensource.org/authority
| foresterre wrote:
| I would argue that this isn't "normal open source", though it
| is indeed not locked behind a login on their website. The
| license (1) is feels very much proprietary, even if the source
| code is available.
|
| (1) https://github.com/NVIDIA/warp/blob/main/LICENSE.md
| bionhoward wrote:
| Agreed, especially given this
|
| "2.7 You may not use the Software for the purpose of
| developing competing products or technologies or assist a
| third party in such activities."
|
| vs
|
| "California's public policy provides that every contract that
| restrains anyone from engaging in a lawful profession, trade,
| or business of any kind is, to that extent, void, except
| under limited statutory exceptions."
|
| https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml.
| ..
|
| (Owner/Partner who sold business, may voluntarily agree to a
| noncompete, (which is now federally
| https://www.ftc.gov/legal-library/browse/rules/noncompete-
| ru... banned) is the only exception I found).
|
| I'm not a lawyer. Any lawyers around? Could the 2nd provision
| invalidate the 1st, or not?
| philipov wrote:
| You're free to engage in a lawful profession, just not
| using _that_ Software for it. "to that extent" is not
| there merely for show.
| bionhoward wrote:
| Hey, that's a real argument, and it makes sense. Thank
| you for helping to clarify this topic.
|
| Question: why would NVIDIA, makers of general
| intelligence, which seems to compete with everyone,
| publish code for software nobody can use without breaking
| NVIDIA rules? Wouldn't it be better for everyone if they
| just kept that code private?
| bionhoward wrote:
| ah, just found this license here for another NVIDIA
| product released today
| https://developer.download.nvidia.com/licenses/nvidia-
| open-m... this is way better
| jasongill wrote:
| FYI, the FTC noncompete rule does not go into effect until
| September, and it specifically carves out an exception to
| the rule for existing noncompetes for senior executives
| dagmx wrote:
| This isn't open source. It s the equivalent of headers being
| available to a dylib, just that they happen to be a python API.
|
| Most of the magic is behind closed source components, and it's
| posted with a fairly restrictive license.
| fragmede wrote:
| And people say nvida doesn't have a moat.
| boywitharupee wrote:
| In a similar fashion, you'll see that JAX has frontend code
| being open-sourced, while device-related code is distributed
| as binaries. For example, if you're on Google's TPU, you'll
| see _libtpu.so_ , and on macOS, you'll see
| _pjrt_plugin_metal_1.x.dylib_.
|
| The main optimizations (scheduler, vectorizer, etc.) are
| hidden behind these shared libraries. If open-sourced, they
| might reveal hints about proprietary algorithms and provide
| clues to various hardware components, which could potentially
| be exploited.
| water-your-self wrote:
| Accessible, as long as you purchase their very contested
| hardware.
| jorlow wrote:
| Does this compete at all with openAI's triton (which is sort of a
| higher level cuda without the vendor lock in)?
| arvinsim wrote:
| As someone who is not in the simulation and graphic space, what
| does this library bring that current libraries do not?
| ok123456 wrote:
| It overlaps a lot with the library Taichi, which Disney
| supports.
|
| It's noteworthy that Taichi also supports AMD, MPI, and Kokkos.
| paulluuk wrote:
| While this is really cool, I have to say..
|
| > import warp as wp
|
| Can we please not copy this convention over from numpy? In the
| example script, you use 17 characters to write this just to save
| 18 characters later on in the script. Just import the warp
| commands you use, or if you really want "import warp", but don't
| rename imported libraries, please.
| dahfizz wrote:
| Strongly agreed! This convention has even infected internal
| tooling at my company. Scripts end up with tons of cryptic
| three letter names. It saves a couple keystrokes but wastes
| engineering time to maintain
| physicsguy wrote:
| The convention is a convention because the libraries are used
| so commonly. If you give anyone in scientific computing
| Python world something with "np" or "pd" then they know what
| that is. Doing something other than what is convention for
| those libraries wastes more time when people jump into a file
| because people have to work out now whether "array" is some
| bespoke type or the NumPy one they're used to.
| paulluuk wrote:
| There is no way that "warp" is already such a household
| name that it's common enough to shorten it to "wp".
| Likewise, the libraries at OP's company are for sure not
| going to be common to anyone starting out at the company,
| and might still be confusing to anyone who has worked there
| for years but just hasn't had to use that specific library.
|
| Pandas and Numpy are popular, sure. As is Tensorflow (often
| shortened to tf). But where do you draw the line, then?
| should the openai library be imported as oa? should flask
| be imported as fk? should requests be imported as rq?
|
| It seems to happen mostly to libraries that are commonly
| used by one specific audience: scientists who are forced to
| use a programming language, and who think that 1-letter
| variables are good variable names, and who prefer using
| notebooks over scripts with functions.
|
| Don't get me wrong, I'm glad that Python gets so much
| attention from the scientific community, but I feel that
| small little annoyances like this creep in because of it,
| too.
| hot_gril wrote:
| It doesn't really matter
| dr_kiszonka wrote:
| Interesting. That is a good point. However, if I saw someone
| writing numpy.array() or pandas.read_csv(), my first reaction
| would be to think they were a beginner.
| hoosieree wrote:
| The more Python I write, the more I feel that "from foo import
| name1,name2,nameN" is The Way. Yes it's more verbose. Yes it
| loses any benefits of namespaces. However it encourages you to
| focus on your actual problem rather than hypothetical problems
| you might have someday, and the namespace clashes might have a
| positive unintended consequence of making you realize you don't
| actually need that other library after all.
| water-your-self wrote:
| > the namespace clashes might have a positive unintended
| consequence of making you realize you don't actually need
| that other library after all.
| Y_Y wrote:
| import warp as np
|
| Now you can re-use your old code as-is!
| 2cynykyl wrote:
| This math is not adding up for me...isn't import warp
| necessary? So you only 6 more characters to write as wp. And
| anyway, to me savings in cognitive load later when I'm in the
| flow of coding is worth it.
| w-m wrote:
| I was playing around with taichi a little bit for a project.
| Taichi lives in a similar space, but has more than an NVIDIA
| backend. But its development has stalled, so I'm considering
| switching to warp now.
|
| It's quite frustrating that there's seemingly no long-lived
| framework that allows me to write simple numba-like kernels and
| try them out in NVIDIA GPUs and Apple GPUs. Even with taichi, the
| Metal backend was definitely B-tier or lower: Not offering 64 bit
| ints, and randomly crashing/not compiling stuff.
|
| Here's hoping that we'll solve the GPU programming space in the
| next couple years, but after ~15 years or so of waiting, I'm no
| longer holding my breath.
|
| https://github.com/taichi-dev/taichi
| panagathon wrote:
| This is the library I've always wanted. Look at that Julia set.
| Gorgeous. Thanks for this. I'm sorry to hear about the dev
| issues. I wish I could help.
| paulmd wrote:
| the problem with the GPGPU space is that everything except CUDA
| is so _fractally broken_ that everything eventually converges
| to the NVIDIA stuff that actually works.
|
| yes, the heterogeneous compute frameworks are largely broken,
| except for OneAPI, which does work, but only on CUDA. SPIR-V,
| works best on CUDA. OpenCL: works best on CUDA.
|
| Even once you get past the topline "does it even attempt to
| support that", you'll find that AMD's runtimes are broken too.
| Their OpenCL runtime is buggy and has a bunch of paper features
| which don't work, and a bunch of AMD-specific behavior and bugs
| that aren't spec-compliant. So basically you have to have an
| AMD-specific codepath anyway to handle the bugs. Same for
| SPIR-V: the biggest thing they have working against them is
| that AMD's Vulkan Compute support is incomplete and buggy too.
|
| https://devtalk.blender.org/t/was-gpu-support-just-outright-...
|
| https://render.otoy.com/forum/viewtopic.php?f=7&t=75411 ("As of
| right now, the Vulkan drivers on AMD and Intel are not mature
| enough to compile (much less ship) Octane for Vulkan")
|
| If you are going to all that effort anyway, why are you (a)
| targeting AMD at all, and (b) why don't you just use CUDA in
| the first place? So everyone writes more CUDA and nothing gets
| done. Cue some new whippersnapper who thinks they're gonna cure
| all AMD's software problems in a month, they bash into the
| brick wall, write blog post, becomes angry forums commenter,
| rinse and repeat.
|
| And now you have another abandoned cross-platform project that
| basically only ever supported NVIDIA anyway.
|
| Intel, bless their heart, is actually trying and their stuff
| largely does just work, supposedly, although I'm trying to get
| their linux runtime up and running on a Serpent Canyon NUC with
| A770m and am having a hell of a time. But supposedly it does
| work especially on windows (and I may just have to knuckle
| under and use windows, or put a pcie card in a server pc). But
| they just don't have the marketshare to make it stick.
|
| AMD is stuck in this perpetual cycle of expecting _anyone_ else
| but themselves to write the software, and then not even
| providing enough infrastructure to get people to the starting
| line, and then surprised-pikachu nothing works, and surprise-
| pikachu they never get any adoption. Why has nvidia done
| this!?!? /s
|
| The other big exception is Metal, which both works and has an
| actual userbase. The reason they have Metal support for cycles
| and octane is because _they contribute the code_ , that's
| really what needs to happen (and I think what Intel is doing -
| there's just a lot of work to come from zero). But of course
| Metal is apple-only, so really ideally you would have a layer
| that goes over the top...
| szvsw wrote:
| I've been in love with Taichi for about a year now. Where's the
| news source on development being stalled? It seemed like things
| were moving along at pace last summer and fall at least if I
| recall correctly.
| w-m wrote:
| https://github.com/taichi-dev/taichi/discussions/8506
| szvsw wrote:
| Ha, interesting timing, last post 6Hr ago. Sounds like they
| are dogfooding it at least which is good. And I would agree
| with the assessment that 1.x is fairly feature complete, at
| least from my experience using it (scientific computing).
| And good to hear that they are planning on pushing support
| patches for eg python 3.12 cuda 12 etc
| contravariant wrote:
| There's only been 7 commits to master in the last 6 month,
| half of those purely changes to test or documentation, so it
| kind of sounds like you're both right.
| talldayo wrote:
| > Here's hoping that we'll solve the GPU programming space in
| the next couple years, but after ~15 years or so of waiting,
| I'm no longer holding my breath.
|
| It feels like the ball is entirely in Apple's court. Well-
| designed and Open Source GPGPU libraries exist, even ones that
| Apple has supported in the past. Nvidia supports many of them,
| either through CUDA or as a native driver.
| dudus wrote:
| Gotta keep digging that CUDA moat as hard and as fast as
| possible.
| astromaniak wrote:
| Exactly. and that's why it's valued at $3T++, about 10x of AMD
| and Intel put together.
| VyseofArcadia wrote:
| Aren't warps already architectural elements of nvidia graphics
| cards? This name collision is going to muddy search results.
| logicchains wrote:
| >Aren't warps already architectural elements of nvidia graphics
| cards?
|
| Architectural elements of _all_ graphics cards.
| VyseofArcadia wrote:
| Unsure of how authoritative this is, but this article[0]
| seems to imply it's a matter of branding.
|
| > The efficiency of executing threads in groups, which is
| known as warps in NVIDIA and wavefronts in AMD, is crucial
| for maximizing core utilization.
|
| [0] https://www.xda-developers.com/how-does-a-graphics-card-
| actu...
| logicchains wrote:
| ROCm also refers to them as warps https://rocm.docs.amd.com
| /projects/HIP/en/latest/understand/... :
|
| >The threads are executed in groupings called warps. The
| amount of threads making up a warp is architecture
| dependent. On AMD GPUs the warp size is commonly 64
| threads, except in RDNA architectures which can utilize a
| warp size of 32 or 64 respectively. The warp size of
| supported AMD GPUs is listed in the Accelerator and GPU
| hardware specifications. NVIDIA GPUs have a warp size of
| 32.
| nurettin wrote:
| How is this different than taichi? Even the decorators look
| similar.
| raytopia wrote:
| I love how many python to native/gpu code projects there are now.
| It's nice to see a lot of competition in the space. An
| alternative to this one could be Taichi Lang [0] it can use your
| gpu through Vulkan so you don't have to own Nvidia hardware.
| Numba [1] is another alternative that's very popular. I'm still
| waiting on a Python project that compiles to pure C (unlike
| Cython [2] which is hard to port) so you can write homebrew games
| or other embedded applications.
|
| [0] https://www.taichi-lang.org/
|
| [1] http://numba.pydata.org/
|
| [2] https://cython.readthedocs.io/en/stable/
| setopt wrote:
| CuPy is also great - makes it trivial to port existing
| numerical code from NumPy/SciPy to CUDA, or to write code than
| can run either on CPU or on GPU.
|
| I recently saw a 2-3 orders of magnitude speed-up of some
| physics code when I got a mid-range nVidia card and replaced a
| few NumPy and SciPy calls with CuPy.
| 6gvONxR4sf7o wrote:
| Don't forget JAX! It's my preferred library for "i want to
| write numpy but want it to run on gpu/tpu with auto diff etc"
| westurner wrote:
| From https://news.ycombinator.com/item?id=37686351 :
|
| >> _sympy.utilities.lambdify.lambdify()https://github.com/s
| ympy/sympy/blob/a76b02fcd3a8b7f79b3a88df... :_
|
| >> _" ""Convert a SymPy expression into a function that
| allows for fast numeric evaluation""" [e.g. the CPython
| math module, mpmath, NumPy, SciPy,_ CuPy, JAX, TensorFlow,
| _SymPy, numexpr,]_
|
| sympy#20516: "re-implementation of torch-lambdify"
| https://github.com/sympy/sympy/pull/20516
| skrhee wrote:
| I would like to warn people away from taichi if possible. At
| least back in 1.7.0 there were some bugs in the code that made
| it very difficult to work with.
| hoosieree wrote:
| Do you have any more specifics about these limitations? I'm
| considering trying Taichi for a project because it seems to
| be GPU vendor agnostic (unlike CuPy).
| sinuhe69 wrote:
| I only dabbled in Taichi, but I find its magic has
| limitation. I took a provided example, just increased the
| length of the loop and bam! it crashed the Windows driver.
| Obviously it ran out of memory but I have no idea how how
| to adjust except experiment with different values. If it
| has information about the GPU and its memory, I thought it
| could automatically adjust the block size but apparently
| not. There is a config command to fine tune the for loop
| parallelizing but the docs says we normally do not need to
| use them.
| szvsw wrote:
| I'm a huge Taichi stan. So much easier and more elegant than
| numba. The support for data classes and data_oriented classes
| is excellent. Being able to define your own memory layouts is
| extremely cool. Great documentation. Really really recommend!
| Joky wrote:
| > I'm still waiting on a Python project that compiles to pure C
|
| In case you haven't tried it yet, Pythran is an interesting one
| to play with: https://pythran.readthedocs.io
|
| Also, not compiling to C but to native code still would be
| Mojo: https://www.modular.com/max/mojo
| holoduke wrote:
| Does it really matters in performance. I see python in these
| kind of setups as orchestrators of computing apis/engines.
| For example from python you instruct to compute following
| list etc. No hard computing in python. Performance not so
| much of an issue.
| crabbone wrote:
| Marshaling is an issue as well as concurrency.
|
| Simply copying a chunk of data between two libraries
| through Python is already painful. There are so-called
| "buffer API" in Python, but it's very rare that Python
| users can actually take advantage of this feature. If
| anything in Python as much as looks at the data, that's not
| going to work etc.
|
| Similarly, concurrency. A lot of native libraries for
| Python are written with the expectation that nothing in
| Python really runs concurrently. And then you are presented
| with two bad options: try running in different threads (so
| that you don't have to copy data), but things will probably
| break because of races, or run in different processes, and
| spend most of the time copying data between them. Your
| interface to stuff like MPI is, again, only at the native
| level, or you will copy so much that the benefits of
| distributed computation might not outweigh the downsides of
| copying.
| ok123456 wrote:
| nuitka already does this
| pjmlp wrote:
| I would rather that Python catches up with Common Lisp tooling
| in JIT/AOT in the box, instead of compilation via C.
| heavyset_go wrote:
| I'd kill for AOT compiled Python. 3.13 ships with a basic JIT
| compiler.
| pjmlp wrote:
| In 3.13 you need to compile Python yourself if you want to
| test the preview JIT.
| crabbone wrote:
| Just be honest with yourself. You want Java. Or some other
| JVM language.
|
| Python has no real JIT, Python needs no real JIT.
|
| Python is a gimmick language whose goodness was in showing
| Java programmers that a master can write a short and concise
| program that expresses just as much as ten times as much of
| XML configuration files with a grotesque layering of code
| abstractions would.
|
| Masters have left Python around 15 years ago. Today it's just
| a graffiti on a wall of an abandoned house squatted by people
| who have no idea who made the graffiti and what it was about.
| It's a joke that stopped being funny so long ago nobody can
| even remember why it was funny in the first place.
|
| Adding JIT and other nonsense that was added to Python in the
| last decade is just as stupid as trying to build a spaceship
| based on a tricycle by incrementally adding missing features.
| It would've been much easier to just build a spaceship than
| to modify a tricycle to act like one, especially since you
| aren't allowed to throw the tricycle away.
| nequo wrote:
| Where have the masters gone from Python?
| crabbone wrote:
| Into management? :)
|
| There is no career path for programmers. Once you are a
| programmer, that's the end of your career. You climb the
| ladder by starting to manage people. But you don't become
| a super-programmer.
| trallnag wrote:
| What about 10x programmers
| pjmlp wrote:
| Unfortunately there are folks that believe we have to use
| Python, and then rewrite in C, while calling those wrappers
| "Python" libraries, regardless of how much I would like to
| use JVM or CLR instead.
|
| So if Python has taken Lisp's place in AI, I would rather
| that it also takes the JIT and AOT tooling as well.
| crabbone wrote:
| Well, then, I guess, I should've prefaced what I wrote
| with "ideally".
|
| Of course, there are circumstances we, the flesh-and-
| blood people cannot change, and if Python is all there
| is, no matter how good or bad it is, then it's probably
| better to make it do something we need... One could still
| hope though.
| lwroQA wrote:
| Yes, there are still a couple of older people around who
| used DEI-grifting to take control of the Python org. They
| of course kept the key positions instead of giving them to
| underprivileged people.
|
| So, a lot of interesting people have left. The scientific
| ecosystem does its own thing and keeps Python somewhat
| alive. Not sure why one must write CUDA kernels in Python.
| Why don't they write a Lisp -> PTX compiler so that one can
| directly compile formally verified kernels?
| tony69 wrote:
| https://nuitka.net/ ?
| owenpalmer wrote:
| > Warp is designed for spatial computing
|
| What does this mean? I've mainly heard the term "spatial
| computing" in the context of the Vision Pro release. It doesn't
| seem like this was intended for AR/VR
| educasean wrote:
| As someone not in this space, I was immediately tripped up by
| this as well. Does spatial computing mean something else in
| this context?
| basiccalendar74 wrote:
| main use case seems to be simulations in 2D, 3D or nD spaces.
| spaces -> spatial.
| water-your-self wrote:
| >GPU support requires a CUDA-capable NVIDIA GPU and driver
| (minimum GeForce GTX 9xx).
|
| Very tactful from nvidia. I have a lovely AMD gpu and this
| library is worthless for it.
| coldtea wrote:
| Err, it is nvidia. Why would they support AMD?
| jarmitage wrote:
| > What's Taichi's take on NVIDIA's Warp?
|
| > Overall the biggest distinction as of now is that Taichi
| operates at a slightly higher level. E.g. implict loop
| parallelization, high level spatial data structures, direct
| interops with torch, etc.
|
| > We are trying to implement support for lower level programming
| styles to accommodate such things as native intrinsics, but we do
| think of those as more advanced optimization techniques, and at
| the same time we strive for easier entry and usage for beginners
| or people not so used to CUDA's programming model
|
| - https://github.com/taichi-dev/taichi/discussions/8184
| BenoitP wrote:
| This should be seen in light of the Great Differentiable
| Convergence(tm):
|
| NERFs backpropagating pixels colors into the volume, but also
| semantic information from the image label, embedded from an LLM
| reading a multimedia document.
|
| Or something like this. Anyway, wanna buy an NVIDIA GPU ;)?
| wallscratch wrote:
| Can anyone comment on how efficient the Warp code is compared to
| manually written / fine-tuned CUDA?
| jokoon wrote:
| funny that now some softwares are hardware dependent
|
| OpenCL seems like it's just obsolete
___________________________________________________________________
(page generated 2024-06-14 23:00 UTC)