[HN Gopher] Numba: A High Performance Python Compiler
___________________________________________________________________
Numba: A High Performance Python Compiler
Author : tosh
Score : 188 points
Date : 2022-12-27 13:36 UTC (9 hours ago)
(HTM) web link (numba.pydata.org)
(TXT) w3m dump (numba.pydata.org)
| itamarst wrote:
| Quick overview of the design space:
|
| * PyPy JITs everything, so it can do _normal_ Python numerical
| code that is quite fast and regular Python code that is fast.
| However, its interactions with libraries like NumPy add overhead,
| and it seems like it can't JIT code that interacts with NumPy in
| a useful way (AFAIK, would be happy to be proven wrong). So not
| useful for optimizing numeric functions that interact with
| libraries like NumPy.
|
| * Plain old NumPy and friends. This is great... if the operation
| you want is already available as a "vectorized" API. "Vectorized"
| in this context does NOT mean SIMD, it's a Python-specific usage,
| see below.
|
| * Numba: JIT compilation specifically focusing on interop with
| NumPy and similar libraries. Lets you write subset of Python but
| unlike NumPy you can use for loops and go fast.
|
| * AOT compilation: Cython, Rust, C++, etc.. You have a longer
| feedback loop, but you have a full programming language,
| especially if you avoid Cython. OTOH Cython has nicer Python
| interop so for simple just-a-little-addon it can be easier to use
| if you don't already know Rust. You really shouldn't be writing
| new C++ in this day and age (but wrapping an existing library is
| useful). Like C++, Cython doesn't help with memory safety. Cython
| also suffers from two compilers, so debugging can be harder,
| especially if you use the C++ interop; if you are wrapping
| existing C++ library, I'd probably start with PyBind11 based on
| long-ago experience with Boost::Python.
|
| Longer form:
|
| * "Vectorization" in the context of Python:
| https://pythonspeed.com/articles/vectorization-python/
|
| * PyPy and Numba as alternatives to vectorization:
| https://pythonspeed.com/articles/vectorization-python-altern...
|
| * Choosing a compiled language:
| https://pythonspeed.com/articles/rust-cython-python-extensio...
|
| * The performance overhead of AOT compiled libraries (less
| relevant if you're doing anything numeric):
| https://pythonspeed.com/articles/python-extension-performanc...
|
| * Numba intro: https://pythonspeed.com/articles/numba-faster-
| python/
| hedgehog wrote:
| Good overview. "Vectorized" is an old term that's been around
| since the early days of supercomputers and maybe before, not
| sure where it came from. Numba does a bunch of different things
| for code written to the Numpy API including CUDA acceleration.
| Certain machine learning frameworks like PyTorch and JAX also
| roughly follow the Numpy API because it is widely familiar and
| easy enough to work with. The kind of code that benefits from
| this kind of acceleration is hard to write yourself. A lot of
| workloads lean on linear algebra operations that are
| conceptually simple but complicated to implement with good
| performance, thus why all of this tooling isn't just a couple
| thousand lines of C. Good overview of matmul on CPU:
|
| https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184...
| chestertn wrote:
| I will save you the pain: switch to Julia.
| m_c_g wrote:
| Indeed! Converting one's entire code base to a different
| language ecosystem, finding equivalents to each of your third-
| party dependencies, is less painful than employing a library to
| selectively compile a few performance bottlenecks in your code.
|
| (Modules like PyJulia facilitate a more incremental approach.)
| Alifatisk wrote:
| /s
| xigoi wrote:
| That's why you should switch before creating the codebase in
| the first place.
| IceHegel wrote:
| Are there a standard set of benchmarks these python JIT projects
| use?
|
| I'm very interested in adding something like this to some
| projects but it needs to be 10-100x faster to be worth the
| hassle. Otherwise, for our applications, it's a better time
| investment to rewrite in Go and get the speed and pro tooling
| than to further optimize python.
| ellisv wrote:
| I'm surprised you'd rewrite in Go rather than Julia. I'd expect
| Julia would be much easier to translate to from Python and have
| much better support for any mathematical operation.
| doliveira wrote:
| Lol, matrix arithmetics and scientific programming in Go
| hedgehog wrote:
| If you have numeric code that's too slow in Numba your next
| stop will likely involve a big multi-language effort and GPU
| specialists and none of that would be in Go except maybe a
| wrapper for your apps.
| chazeon wrote:
| Software from our group (cij[1], qha[2]) were developed when
| numba seems to be the best option for JIT. It generates more pain
| in the hindsight. It generates a lot of depreciated warning due
| to unstable API, locked numpy to a certain version (i remember
| 1.21) due to compatibility issues, and when M1 Mac comes out,
| there were for a long time lack of llvmlite porting to the new
| platform, so cannot run on these new Macs.
|
| If I had to do it again I would just use plain numpy or use the
| JAX from Google if JIT is really necessary.
|
| [1]: https://github.com/MineralsCloud/cij
|
| [2]: https://github.com/MineralsCloud/qha
| gjvc wrote:
| What if I'm (in Python) doing non-numerical stuff like parsing
| text and generating code? What JIT / AOT tooling (if any) is
| suitable?
| fwilliams wrote:
| I have personally gotten a lot of mileage from just writing
| the compute heavy parts of my code in C++ and exposing it to
| Python with a tool like PyBind11 [1] or NumpyEigen [2]. I
| find tools like numba and cython to be more trouble than
| they're worth.
|
| [1] https://github.com/pybind/pybind11 [2]
| https://github.com/fwilliams/numpyeigen
| netjiro wrote:
| I prototype in python or whatever, then, if the project
| survives into market and has legs I either buy more
| hardware or rewrite the expensive parts in C++.
|
| Reduces calendar time, risk, cost. And I'm likely to make
| better decisions once the code and market is better
| understood after the prototype is tested under real world
| conditions and the requirements have changed (like they
| always seem to do).
| singhrac wrote:
| As a slight contrast to the other responses, I found setting
| up maturin (Rust + Python) very straightforward since the
| documentation is recent, and I find it's easy to write
| parsers in Rust because the ADT syntax is very terse.
| auxym wrote:
| Pypy, probably. You could also consider writing pre compiled
| extensions for your "hot" code, eg. in Cython.
| chazeon wrote:
| I think most parsing-heavy code are just use C/C++ extension.
|
| Example I can think of include:
|
| 1. pyyaml's parser in C vs the Python version get a huge
| speed up on large files
|
| 2. some parsing table (~GB size) using pandas vs self-
| implemented Python code with a lot of for loop gain 20x speed
| up at least.
| ptype wrote:
| I think what will be the most maintainable and bring you the
| least long term pain is Cython
| [deleted]
| baggiponte wrote:
| I am really intrigued by the Codon project, which aims to be a
| JIT compiler for Python with Numba/JAX decorator syntax:
| https://github.com/exaloop/codon
| ipsum2 wrote:
| It's not going to take off, since it doesn't have full (or even
| most) API compatibility with Python. Numba seems strictly
| better because it can interop with Python.
| stared wrote:
| As a side note, now it is easy to write Rust code, which can be
| directly used in Python - https://github.com/PyO3/pyo3.
|
| It cannot use NumPy and other libraries (since it is Rust), but
| at the same time, I see its potential in creating high-
| performance code to be used in Python numerical environment.
| kylebarron wrote:
| On the contrary, it can use and interface with numpy quite
| easily: https://github.com/PyO3/rust-numpy
| stared wrote:
| Good to know!
| grej wrote:
| We were very heavy numba users at my former company. I would even
| go so far as to say numba was probably the biggest computational
| enabler for the product. I've also made a small contribution to
| the library.
|
| It's a phenomenal library for developing novel computationally
| intensive algorithms on numpy arrays. It's also more versatile
| than Jax.
|
| In presentations, I've heard Leland McInnes credits numba often
| when he speaks of his development of UMAP. We built a very
| computationally intensive portion of our application with it and
| it has been running in production, stable, for several years now.
|
| It's not suitable for all use cases. But I recommend testing it
| if you need to do somewhat complex calculations iterating over
| numpy arrays for which standard numpy or scipy functions don't
| exist. Even then, often we were surprised that we could speed up
| some of those calculations by placing them inside numba.
|
| Edit: ex of a very small function I wrote with numba that speeds
| up an existing numpy function (note - written years ago and numba
| has undergone quite some amount of changes since!):
| https://github.com/grej/pure_numba_alias_sampling
|
| Disclosure - I now work for Anaconda, the company that sponsors
| the numba project.
| PheonixPharts wrote:
| > It's also more versatile than Jax
|
| Does numba do automatic differentiation?
|
| I view JAX as primarily an automatic differentiation tool with
| the bonus that it makes great use of XLA and can easy make use
| of GPU/TPUs.
|
| I don't usually see numba and JAX as solving the same problem,
| but would be excited to be wrong
| fasttriggerfish wrote:
| [dead]
| melony wrote:
| These days I have switched to
|
| https://www.taichi-lang.org/
| melling wrote:
| [flagged]
| dang wrote:
| That's a bit too cynical, I think. People post follow-
| up/related stories because the brain likes to follow chains of
| associations.
|
| You're right that these chains tend towards already-familiar
| associations, which lower their value as HN stories. The best
| HN stories are the ones that can't be predicted from any
| existing sequence!
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
| inflationwatch wrote:
| [dead]
| _Wintermute wrote:
| I've only used numba once but I was really impressed. We have an
| analysis at work that runs hundreds of times a day which uses a
| Hampel filter written in numpy, but still requires iterating over
| an array. Just adding a @numba.jit decorator above the function
| gave us a 10x speed improvement.
| Escapado wrote:
| When I wrote my bachelor thesis years back I worked on a
| particle-in-cell code [1] that makes heavy use of numba for GPU
| kernels. At the time it was the most convenient way to do that
| from python. I remember spending weeks to optimizing these
| kernels to eek out every last bit of performance I could (which
| interestingly enough did eventually involve using atomic
| operations and introducing a lot of variables[2] instead of using
| arrays everywhere to keep things in registers instead of slower
| caches).
|
| I remember the team being really responsive to feature requests
| back then and I had a lot of fun working with it. IIRC compared
| to using numpy we managed to get speedups of up to 60x for the
| most critical pieces of code.
|
| [1]: https://github.com/fbpic/fbpic [2]:
| https://github.com/fbpic/fbpic/blob/1867a4f216baf4269f2314ab...
| lvass wrote:
| Does anyone know how this approach of adding decorators to
| numerical functions compare to Elixir's Nx approach of compiling
| those functions through a specialized macro for numerical
| computations? Would Numba benefit if (PEP 638?) macros were added
| to python?
| usgroup wrote:
| I think numba still makes sense for loopy algorithms but not so
| much if youre more vector oriented given that Jax is more or less
| a drop in replacement for numpy and is shockingly fast.
| galangalalgol wrote:
| I have used pytorch as a (almost) drop in replacement for
| numpy. Are there good reasons to look at jax instead assuming
| I'm doing DSP and not ML?
| martinsmit wrote:
| If you are doing array or vector-based work where the
| operations can be written as maps as opposed to for loops
| then JAX is king imo.
| patrickkidger wrote:
| Honestly, the two are now incredibly close.
|
| JAX introduced a lot of cool concepts (e.g. autobatching
| (vmap), autoparallel (pmap)) and supported a lot of things
| that PyTorch didn't (e.g. forward mode autodiff).
|
| And at least for my applications (scientific computing), it
| was much faster (~100x) due to a much better JIT compiler and
| reduced Python overhead.
|
| ...but! PyTorch has worked hard to introduce all of the
| former, and the recent PyTorch 2 announcement was primarily
| about a better JIT compiler for PyTorch. (I don't think
| anyone has done serious non-ML benchmarks for this though, so
| it remains to be seen how this holds up.)
|
| There are still a few differences. E.g. JAX has a better
| differential equation solving ecosystem. PyTorch has a better
| protein language model ecosystem. JAX offers some better
| power-user features like custom vmap rules. PyTorch probably
| has a lower barrier to entry.
|
| (FWIW I don't know how either hold up specifically for DSP.)
|
| I'd honestly suggest just trying both; always nice to have a
| broader selection of tools available.
| short_sells_poo wrote:
| As someone who uses the python numerical computing libraries
| extensively, Numba is my biggest disappointment in the ecosystem.
|
| The main problem with Numba is that simple functions are easy
| enough, and this lulls you into a false sense of security- that
| things will work.
|
| Unfortunately, every time it turns into an a hair tearing
| exercise of trying to structure the code such that Numba's vast
| array of unpredictable edge cases isn't hit.
|
| The error messages are often infuriatingly bad.
|
| At this point I've banned Numba from our codebase. If there's a
| case for Numba, we just do it in C++ instead.
|
| Edit: we've been looking at Taichi https://www.taichi-lang.org/
| samsquire wrote:
| How would this compare to Pypy?
|
| I didn't think Pypy uses LLVM so I wonder who produced better
| code.
|
| That said, they're targeted at different audiences. I feel Numba
| is targeted at data science and machine learning and even AI.
|
| I feel a large portion of using or programming a computer is
| structural and not the actual work of adding numbers together.
| Very little of the code generated does the useful part a computer
| does: addition. The rest is control flow management and data
| placement! It's all preparation for the code to do an addition.
| The hard part is putting together the structure for the computer
| to do things that are useful.
|
| So we invented methods, variables, classes, functions, closures,
| expressions to create that structure easier.
|
| I thought about creating a language which tries to eliminate the
| structure that most programs accumulate and focus on the critical
| addition or calculation and let the computer do the arrangement.
| A JIT compiler for structure.
| csdvrx wrote:
| > let the computer do the arrangement
|
| Isn't that constraint propagation?
|
| I'm discovering JS at the moment. I don't fully understand the
| async model, but the promise seems like a generic constraint of
| "the result is now available"
|
| Maybe you could have the "flow managements" as other
| constraints?
| samsquire wrote:
| Thank you for your reply.
|
| I'm thinking the code for your average CRUD or even desktop
| compositor. A compositor copies pixels from multiple places
| into one place. Surely that can be defined with a simple
| loop? But no there's hundreds of APIs in the way. Add Wayland
| and X11 and you have something that is opaque and understood
| by very few people.
|
| The motivation behind my comment was that most of programming
| computers is gluing together APIs to shift data from one
| place to another before doing something useful with it. The
| APIs themselves do very little addition or subtraction of
| data but actual just moving data around and placing it into
| the right place.
|
| Maybe defining where things should be, declaratively, in
| order to do a calculation would be useful. So the shape of
| the calculation defines the data structure, rather than the
| data structure defining the caclulation.
| csdvrx wrote:
| For a compositor, I'd think of the set pixels being changed
| (an "invalidation") a good example: the constraint would be
| to update it on the screen.
|
| Unchanged? Don't bother, leave it as-is. I think that's how
| Intel power saving works.
|
| Now think about the MVC model: some changes in the data
| could result in a change in the view if the data currently
| shown on screen is what has changed - like triggers in SQL.
|
| I wonder if you could have everything work like that?
| samsquire wrote:
| You're right, and thank you for bringing async up.
|
| And thankyou for bringing up constraint propagation.
|
| One of my ideas is the definition of formulas that act as
| materialized views over other materialised views. So we
| can layer materialized views over other materialized
| views and then work out a derived formula that is
| potentially nearer to what we want and potentially
| summarise the formula without needing to calculate the
| underlying views, we can compute the formula directly.
|
| Is this differential dataflow?
|
| I think it's an application of algebra and JIT compilers
| could do it to expressions if we fed symbolic expressions
| of programming languages into sympy or machine algebra.
|
| In react, react does diffing between virtual DOM nodes to
| see if there are changed. There is also dirty region
| checking in old games and damage regions. These problems
| are mathematically defined.
|
| Here's my writings on the idea
| https://github.com/samsquire/ideas4#31-algebraic-
| materialise...
| csdvrx wrote:
| > I think it's an application of algebra and JIT
| compilers could do it to expressions if we fed symbolic
| expressions of programming languages into sympy or
| machine algebra
|
| Yes and the constraints could be the used to reduce the
| computational costs, giving higher performance and lower
| latency.
|
| A while back, a good friend (we even shared HN accounts
| for a while lol) pointed me to pipelinedb: a PostgreSQL
| timeseries plugin for continuously updating
| """materialized views"""
|
| I use a lot of quotes around, because it wasn't either
| like a regular view (computed when you query it, which
| introduces latency) or a materialized view (frozen, needs
| to be refreshed, same problem) but more like the NO_HZ
| tickless kernel: the update of the calculations was
| caused by the introduction of new data, not the passage
| of time (which would be wasteful)
|
| The general approach makes a lot of sense to me, and I
| see how it could be used for more generic problems.
| optimalsolver wrote:
| I went out and learned C++ because Numba was so finicky to work
| with.
| dang wrote:
| Related:
|
| _Faster Python calculations with Numba_ -
| https://news.ycombinator.com/item?id=30392367 - Feb 2022 (66
| comments)
|
| _Numba: a JIT compiler for Python that works best on code that
| uses NumPy_ - https://news.ycombinator.com/item?id=21614533 - Nov
| 2019 (9 comments)
|
| _How Numba and Cython speed up Python code_ -
| https://news.ycombinator.com/item?id=17678758 - Aug 2018 (45
| comments)
|
| _Numba: High-Performance Python with CUDA Acceleration_ -
| https://news.ycombinator.com/item?id=15301766 - Sept 2017 (62
| comments)
|
| _Numba - JIT specializing compiler for annotated Python and
| NumPy code to LLVM_ -
| https://news.ycombinator.com/item?id=5927787 - June 2013 (8
| comments)
|
| _Accelerating Python Libraries with Numba (Part 2)_ -
| https://news.ycombinator.com/item?id=5757231 - May 2013 (23
| comments)
|
| _Accelerating Python Libraries with Numba_ -
| https://news.ycombinator.com/item?id=5680722 - May 2013 (30
| comments)
|
| _Numba: NumPy-aware optimizing compiler for Python_ -
| https://news.ycombinator.com/item?id=4430780 - Aug 2012 (23
| comments)
|
| _NumPy aware dynamic Python compiler using LLVM_ -
| https://news.ycombinator.com/item?id=3864659 - April 2012 (9
| comments)
|
| _Numba - A NumPy aware (LLVM-based) optimizing compiler for
| Python_ - https://news.ycombinator.com/item?id=3692055 - March
| 2012 (6 comments)
| voz_ wrote:
| Very impressive project. If compiling Python interests you, check
| out the pytorch compiler stack too!
|
| https://pytorch.org/get-started/pytorch-2.0/
| micheles wrote:
| I use numba a lot nowadays. Works perfectly well on all platforms
| (linux, windows, mac, even the M1) and gives speedups as expected
| (few percent for already well vectorized numpy code, and extra-
| large speedups for loopy code). I strongly recommend it for the
| performance critical part of your code. Many things are not
| supported yet, so it has to be used with care. I remember I
| needed a missing scipy special function and I the end I
| implemented it myself by vectorizing math.erf: it was
| surprisingly easy to do and a big success in terms of
| performance.
___________________________________________________________________
(page generated 2022-12-27 23:01 UTC)