[HN Gopher] I don't like NumPy
___________________________________________________________________
I don't like NumPy
Author : MinimalAction
Score : 300 points
Date : 2025-05-15 16:05 UTC (6 hours ago)
(HTM) web link (dynomight.net)
(TXT) w3m dump (dynomight.net)
| dima55 wrote:
| Hear hear! Some of these complaints have been resolved with
| numpysane: https://github.com/dkogan/numpysane/ . With numpysane
| and gnuplotlib, I now find numpy acceptable and use it heavily
| for everything. But yeah; without these it's unusable.
| alanbernstein wrote:
| Thanks for the link, I have also grumbled about these issues,
| but never thought to look for a workaround library on top of
| numpy...
| blululu wrote:
| The author brings up some fair points. I feel like I had all
| sorts of small grievances transitioning from Matlab to Numpy.
| Slicing data still feels worse on Numpy than Matlab or Julia, but
| this doesn't justify the licensing costs for the Matlab
| stats/signal processing toolbox.
|
| The issues that are presented in this article mostly related to
| tensors of rank >2. Numpy is originally just matrices so it is
| not surprising that it has problems in this domain. A dedicated
| library like Torch is certainly better. But Torch is difficult in
| its own ways. IDK, I guess the author's conclusion that numpy is
| "the worst array language other than all the other array
| languages" feels correct. Maybe a lack of imagination on my part.
| jampekka wrote:
| Numpy was about N-dimensional arrays from the getgo. It was a
| continuation of numarray, which was Nd.
| theLiminator wrote:
| Anyone use xarray? Curious how it compares?
| apetrov wrote:
| my experience was positive in context of pymc and arviz (as a
| way to represent posterior and posterior predictive
| distributions). API is sane and quite easy to get around.
| xg15 wrote:
| Looks great, but doesn't support pytorch yet, only numpy, or
| does it?
|
| https://github.com/pydata/xarray/issues/3232
| rrr_oh_man wrote:
| Coming from the excellent tidyverse or even data.table in R,
| Numpy always has felt like twenty steps back into mindfuck
| territory.
| acc_297 wrote:
| There's also the fantastic "tidytable" library. I wouldn't want
| to implement multi-headed self attention in either of those
| libraries though.
|
| I've done only very simple ML stuff using R and it never feels
| like exactly the right tool.
| frakt0x90 wrote:
| I had to write a normal codebase in R exactly one time and it
| was one of the weirdest and most unpleasant coding
| experiences I've had. After that, I decided R _is_ tidyverse
| and a handful of stats libraries. If you need more, reach for
| another tool.
| lottin wrote:
| I never understood the appeal of tidyverse. I have a
| colleague who, like you, thinks that R is tidyverse. He
| also has the nasty habit of starting Excel formulas with a
| '+' sign.
| jampekka wrote:
| The main attraction of tidyverse is that it's easy to
| copy-paste code for common cases. If there's no ready
| recipe for something, it's usually just not done, or some
| "technical person" is called to do it.
|
| R is used mostly as a fancy calculator, which is fine in
| itself, but it makes the comparisons to general purpose
| languages like Python quite apples-to-oranges.
| jampekka wrote:
| Numpy and tidyverse/data.table are not really on the same level
| of abstraction. Something like Pandas would be (and it
| definitely has its warts).
|
| Doing the lower level stuff that NumPy is used for is really
| mindfuck territory in R.
| zahlman wrote:
| TFA keeps repeating "you can't use loops", but aren't they, like,
| _merely_ less performant? I understand that there are going to be
| people out there doing complex algorithms (perhaps part of an ML
| system) where that performance is crucial and you might as well
| not be using NumPy in the first place if you skip any
| opportunities to do things in The Clever NumPy Way. But say I 'm
| just, like, processing a video frame by frame, by using TCNW on
| each frame and iterating over the time dimension; surely that
| won't matter?
|
| Also: TIL you can apparently use _multi-dimensional_ NumPy arrays
| as NumPy array indexers, and they don 't just collapse into
| 1-dimensional iterables. I expected `A[:,i,j,:]` not to work, or
| to be the same as if `j` were just `(0, 1)`. But instead, it
| apparently causes transposition with the previous dimension... ?
| nyeah wrote:
| Right, you can use loops. But then it goes much slower than a
| GPU permits.
| zahlman wrote:
| My point is that plenty of people use NumPy for reasons that
| have nothing to do with a GPU.
| crazygringo wrote:
| The whole point of NumPy is to make things much, much
| faster than interpreted Python, whether you're GPU-
| accelerated or not.
|
| Even code you write now, you may need to GPU accelerate
| later, as your simulations grow.
|
| Falling back on loops is against the entire reason of using
| NumPy in the first place.
| nyeah wrote:
| I really disagree. That's not the only point of NumPy. A
| lot of people use it like Matlab, to answer questions
| with minimal coding time, not minimal runtime.
| crazygringo wrote:
| I mean sure, the fact that it is performant means tons of
| functionality is built on it that is hard to find
| elsewhere.
|
| But the point is still that the main purpose in building
| it was to be performant. To be accelerated. Even if
| that's not why you're personally using it.
|
| I mean, I use my M4 Mac's Spotlight to do simple
| arithmetic. That's not the main point in building the M4
| chip though.
| nyeah wrote:
| As best I can see, you weren't originally talking about
| the reason for creating NumPy. Instead you seemed to be
| talking about the reason for using NumPy.
| nyeah wrote:
| I mean yes. Also in your example where you hardly spend any
| time running Python code, the performance difference likely
| wouldn't matter.
| cycomanic wrote:
| But once you need to use the GPU you need to go to another
| framework anyway (e.g. jax, tensorflow, arrayfire, numba...).
| AFAIK many of those can parallise loops using their jit
| functionality (in fact, e.g. numbas jit for a long time could
| not deal with numpy broadcasing, so you had to write out your
| loops). So you're not really running into a problem?
| yongjik wrote:
| "merely less performant" is severely underselling it. It could
| easily add a few zeros to your execution time.
|
| (And that's before you even consider GPUs.)
| KeplerBoy wrote:
| It's a slippery slope. Sometimes a python loop outside some
| numpy logic is fine but it's insane how much perf you can leave
| on the table if you overdo it.
|
| It's not just python adding interpretor overhead, you also risk
| creating a lot of temporary arrays i.e. costly mallocs and
| memcopies.
| jerf wrote:
| You can draw out a sort of performance hierachy, from fastest
| to slowest: * Optimized GPU code *
| CPU vectorized code * Static CPU unvectorized code
| * Dynamic CPU code
|
| where the last one refers to the fact that a language like
| Python, in order to add two numbers together in its native,
| pure-Python mode, does a lot of boxing, unboxing, resolving of
| class types and checking for overrides, etc.
|
| Each of those is _at least_ an order of magnitude slower than
| the next one up the hierarchy, and most of them appreciably
| more than one. You 're probably closer to think of them as more
| like 1.5 orders of magnitude as a sort of back-of-the-envelope
| understanding.
|
| Using NumPy incorrectly can accidentally take you from the top
| one, all the way to the bottom one, in one fell swoop. That can
| be a big deal, real quick. Or real slow, as the case may be.
|
| In more complicated scenarios, it matters how much computation
| is going how far down that hierarchy. If by "processing a video
| frame by frame" you mean something like "I wrote a for loop on
| the frames but all the math is still in NumPy", you've taken
| "iterating on frames" from the top to the bottom, but who
| cares, Python can iterate on even a million things plenty
| quickly, especially with everything else that is going on. If,
| by constrast, you mean that at some point you're iterating over
| each pixel in pure Python, you just fell all the way down that
| hierarchy for each _pixel_ and you 're in bigger trouble.
|
| In my opinionated opinion, the trouble isn't so much that it's
| possible to fall down that stack. That is arguably a feature,
| after all; surely we should have the capability of doing that
| sort of thing if we want. The problem is how easy it is to do
| without realizing it, just by using Python in what looks like
| perfectly sensible ways. If you aren't a systems engineer it
| can be hard to tell you've fallen, and even if you are honestly
| the docs don't make it particularly easy to figure out.
| HdS84 wrote:
| As a simple example : once upon a time a We needed to
| generate a sort of heat map. Doing it in pure python takes a
| few seconds at the desired size (few thousand cells where
| each cell needs a small formula). Dropping to numpy braucht
| that downs to hundreds of milliseconds. Pushing it to pure c
| got us to tens of milliseconds.
| DrFalkyn wrote:
| Yeah one of other beauties of numpy is you can pass data
| to/from native shared libraries compiled from C code with
| little overhead. This was more klidgy in Matlab last I
| checked
| maximilianroos wrote:
| that's a great hierarchy!
|
| though what does "static cpu" vs "dynamic cpu" mean? it's one
| thing to be pointer chasing and missing the cache like OCaml
| can, it's another to be running a full interpreter loop to
| add two numbers like python does
| coolcase wrote:
| Plus it isn't a checkbox on a UI where Electon being 1000
| times slower (1ms instead of 1micro) would be noticeable.
|
| It could be a 12 hour run vs. 12000000 hour run.
| complex_pi wrote:
| NumPy allows a lot of science to happen. Grievance is fine but a
| little respect is as well.
|
| NumPy is the lingua franca for storing and passing arrays in
| memory in Python.
|
| Thank you NumPy!
| tptacek wrote:
| I get the impression the author is pretty familiar with what
| NumPy is.
| jampekka wrote:
| And given the ending of TFA the author would likely agree
| with GP.
| rekenaut wrote:
| This is great aspect of it, but that doesn't diminish that it
| feels so tedious to work with compared to Julia and Matlab.
| Some of that is just from trying to shoehorn Python as a
| scientific computing language, but I think it's time to revisit
| whether Python should have first party support for
| vectorization, arrays in memory, and broadcasting as Julia and
| Matlab have.
| complex_pi wrote:
| NumPy allowed a PEP for the in-memory representation of
| arrays in Python. This is tremendous useful for making APIs.
| jampekka wrote:
| I've never understood the "so tedious" argument of Python vs
| Matlab. Sure, having to import numpy and use np.array to
| create numpy arrays is a bit typing, but other than that I
| don't see major differences.
|
| Given what inconsistent mess Matlab is aside from matrix
| manipulation, I take NumPy any day.
| nayuki wrote:
| I skimmed the article and agree with it at a high level, though I
| haven't faced most of those problems personally.
|
| I have several gripes with NumPy, or more broadly the notion of
| using Python to call a C/asm library that vectorizes math
| operations. A lot of people speak of NumPy like the solution to
| all your high-performance math needs in Python, which I think is
| disingenuous. The more custom logic you do, the less suitable the
| tools get. Pure Python numeric code is incredibly slow - like
| 1/30x compared to Java - and as you find parts that can't be
| vectorized, you have to drop back down to pure Python.
|
| I would like to give the simple example of the sieve of
| Eratosthenes: def sieve_primeness(limit):
| result = [False] + [True] * limit for i in range(2,
| len(result)): if result[i]: for j
| in range(i * i, len(result), i): result[j]
| = False
|
| This code is scalar, and porting it to a language like
| C/C++/Rust/Java gives decent performance straight out of the box.
| Performance in Python is about 1/30x the speed, which is not
| acceptable.
|
| People pretend that you can hand-wave the performance problem
| away by using NumPy. Please tell me how to vectorize this Python
| code. Go on, I'm waiting.
|
| You can't vectorize the `if result[i]` because that controls
| whether the inner for-loop executes; so it must execute in pure
| Python. For the inner loop, you can vectorize that by creating a
| huge temporary array and then AND it with the result array, but
| that is extremely memory-inefficient compared to flipping bits of
| the result array in place, and probably messes up the cache too.
|
| Alternatively, you can run the code in PyPy, but that usually
| gives a speed-up of maybe 3x, which is nowhere near enough to
| recover the 30x speed penalty.
|
| Another example is that NumPy is not going to help you implement
| your own bigint library, because that also requires a lot of
| custom logic that executes between loop iterations. Meanwhile,
| I've implemented bigint in pure Java with acceptable performance
| and without issues.
|
| Again, my point is that anytime you venture into custom
| numerical/logical code that is not the simple `C = A + B`, you
| enter a world of pain when it comes to Python performance or
| vectorization.
| brosco wrote:
| Have you heard of JIT libraries like numba
| (https://github.com/numba/numba)? It doesn't work for all
| python code, but can be helpful for the type of function you
| gave as an example. There's no need to rewrite anything, just
| add a decorator to the function. I don't really know how
| performance compares to C, for example.
| Jtsummers wrote:
| I'm not a numpy expert, I just use it on occasion but this
| works and eliminates the explicit inner loop, but not the outer
| loop. It collects the list of prime numbers while removing
| multiples of the prime from the numpy array of numbers:
|
| import numpy as np def sieve_primes(limit):
| nums = np.arange(limit + 1) nums[1] = 0 primes
| = [] for i in range(2, limit): if nums[i] !=
| 0: # could just be `if nums[i]:` since 0 is false-y
| primes.append(i) nums = nums * (np.mod(nums, i) !=
| 0) print(primes) sieve_primes(1000)
|
| This takes advantage of the fact that True and False are
| treated as 1 and 0 in Python for the multiplication.
|
| EDIT: The other solutions people have shown are closer to the
| sieve itself than my solution. I was having trouble with the
| slicing notation, now that I've seen how that should be done
| I'd redo the above as: def sieve_primes(limit):
| nums = np.arange(limit + 1) nums[1] = 0 for i
| in range(2, limit): if nums[i] != 0:
| nums[i+i::i] = 0 print(np.nonzero(nums))
|
| However, the first version is faster by my testing so I'd end
| up just going back to it if I really cared about performance.
| xg15 wrote:
| > _Please tell me how to vectorize this Python code. Go on, I
| 'm waiting._
|
| You can't entirely get rid of the loops, but at least the inner
| one can be vectorized, I think. That would reduce the "pythonic
| runtime" from squared to linear.
|
| Maybe something like this? result =
| np.full((limit+1,), True, dtype=bool) result[0] = False
| indices = np.arange(len(result)) for i in range(2,
| len(result)): result[indices[i:] * i] = False
|
| This is just out of my head, so may contain bugs. Also, the
| indices array may need to be clamped, not sure right now if
| numpy accepts out-of-bounds indices.
| nonameiguess wrote:
| Ironically enough, I wrote a Sieve in NumPy 12 years ago or so
| in grad school messing around with Project Euler. I haven't
| touched it since and have not regularly used either NumPy or
| even Python in quite some time, but I still have the solutions
| in a private Github repo. The code is: def
| primes(n): """Input: A natural number n.
| Output: An array of primes, p < n.""" assert n >= 2
| sieve = np.ones(n / 2, dtype=np.bool) for i in
| range(3, int(n ** 0.5) + 1, 2): if sieve[i / 2]:
| sieve[i * i / 2::i] = False return np.r_[2, 2 *
| np.nonzero(sieve)[0][1::] + 1]
|
| It still relies on looping and I have no idea if you can fully
| vectorize everything, but it does at least get quite a bit of
| speedup thanks to the ability to index one array using another,
| so I can avoid the inner loop you have to use in pure Python.
|
| I have absolutely no clue what that final line is doing and I'm
| not sure I understood it 12 years ago, either.
| gjm11 wrote:
| sieve[i] is nonzero precisely when 2i+1 is an odd prime _or_
| i=0 (because it 's initialized to all-1 and nothing ever
| zeros the first element).
|
| So np.nonzero(sieve) is all the indices for which i=0 or 2i+1
| is an odd prime.
|
| Except that the behaviour of np.nonzero is a bit weird: it
| returns a tuple of arrays, one for each axis. (So, e.g., if
| you feed it a 2d array it gives you (a,b) such that the
| nonzero elements are at positions (a[0],b[0]), (a[1],b[1]),
| etc.) That's kinda reasonable behaviour, but it means that
| for the simple case of a 1-d array you get a 1-element tuple.
|
| So np.nonzero(sieve)[0] is _actually_ all the indices i for
| which i=0 or 2i+1 is an odd prime, and now you know why I
| didn 't write "all the indices i for which ..." above.
|
| The case i=0 would correspond to 1 being prime, which it
| isn't considered to be. So we throw that one out.
|
| So np.nonzero(sieve)[0][1::] is all the indices i for which
| 2i+1 is an odd prime.
|
| And so 2*np.nonzero(sieve)[0][1::]+1 is all the 2i+1 for
| which 2i+1 is an odd prime; that is, all the odd primes.
|
| I don't think I've encountered np.r_ before, but it's pretty
| obvious what it's doing: prepending 2 to that array, so now
| it's "2 followed by all the odd primes" or, more simply, all
| the primes.
|
| (Obviously, in the above phrases like "all the primes" mean
| "all the primes up to the limit defined by the size of the
| array".)
| WCSTombs wrote:
| > Please tell me how to vectorize this Python code. Go on, I'm
| waiting.
|
| Here's a start: import numpy as np
| def sieve(limit): result = np.ones(limit + 2,
| dtype=bool) result[0] = False result[1]
| = False for i in range(2, limit + 1):
| if result[i]: yield i
| result[::i] = False for prime in sieve(100):
| print(prime)
|
| As you said, you can't vectorize the outer loop because each
| iteration depends on the result of previous iterations, so I
| think you can't do too much better than that with this
| algorithm. (There are a few easy algorithm optimizations, but I
| think that's kind of orthogonal to your point, and anyway you
| can do them in any language.)
|
| I would expect there's still a significant performance gap
| between that and, say, a simple C implementation, but that
| shouldn't be surprising, and personally I haven't encountered
| these supposed droves of people claiming that NumPy fully
| solves the performance gap. From what I've seen there was a
| general consensus that vectorizing with NumPy can significantly
| tighten the gap but can rarely fully close it.
| __mharrison__ wrote:
| See Numba or Cython...
| DrFalkyn wrote:
| Back when our lab transitioned from Matlab to Python I used
| numpy/scipy quite a bit. I remember heavily leaning on
| numpy.reshape to get things to work correctly. In some cases I
| did resort to looping.
| brosco wrote:
| Compared to Matlab (and to some extent Julia), my complaints
| about numpy are summed up in these two paragraphs:
|
| > Some functions have axes arguments. Some have different
| versions with different names. Some have Conventions. Some have
| Conventions and axes arguments. And some don't provide any
| vectorized version at all.
|
| > But the biggest flaw of NumPy is this: Say you create a
| function that solves some problem with arrays of some given
| shape. Now, how do you apply it to particular dimensions of some
| larger arrays? The answer is: You re-write your function from
| scratch in a much more complex way. The basic principle of
| programming is abstraction--solving simple problems and then
| using the solutions as building blocks for more complex problems.
| NumPy doesn't let you do that.
|
| Usually when I write Matlab code, the vectorized version just
| works, and if there are any changes needed, they're pretty minor
| and intuitive. With numpy I feel like I have to look up the
| documentation for every single function, transposing and
| reshaping the array into whatever shape that particular function
| expects. It's not very consistent.
| breakds wrote:
| For the second problem, if I understand it correctly, you might
| want to try `vmap` from jax.
| jampekka wrote:
| Matlab's support for more than 2 dimensions in arrays is so bad
| that it's rare to encounter the situations lamented in TFA.
| IshKebab wrote:
| Well it's not Tenslab!
| treefarmer wrote:
| Yeah, in case anyone else has the misfortune of having to
| work with multi-dimensional data in MATLAB, I'd recommend the
| Tensor Toolbox, Tensorlab, or the N-way Toolbox.
| phronimos wrote:
| Numba is a great option for speeding up (vectorizing) loops and
| NumPy code, apart from CuPy and JAX. Xarray is also worth trying
| for tensors beyond 2 dimensions.
| KeplerBoy wrote:
| true, a nice jit compiler solves a lot of the problems
| mentioned in the article. These days i often use jax.jit for
| the gpu support and numpy like syntax with the added benefit of
| fast loop constructs.
| vector_spaces wrote:
| My main issue with numpy is that it's often unclear what
| operations will be vectorized or how they will be vectorized, and
| you can't be explicit about it the way you can with Julia's dot
| syntax.
|
| There are also lots of gotchas related to types returned by
| various functions and operations.
|
| A particularly egregious example: for a long time, the class for
| univariate polynomial objects was np.poly1d. It had lots of
| conveniences for doing usual operations on polynomials
|
| If you multiply a poly1d object P on the right by a complex
| scalar z0, you get what you probably expect: a poly1d with
| coefficients scaled by z0.
|
| If however you multiply P on the left by z0, you get back an
| array of scaled coefficients -- there's a silent type conversion
| happening.
|
| So P*z0 # gives a poly1d z0*P # gives an
| array
|
| I know that this is due to Python associativity rules and
| laissez-faire approach to datatypes, but it's fairly ugly to
| debug something like this!
|
| Another fun gotcha with poly1d: if you want to access the leading
| coefficient for a quadratic, you can do so with either P.coef[0]
| or P[2]. No one will ever get these confused, right?
|
| These particular examples aren't really fair because the numpy
| documentation describes poly1d as a part of the "old" polynomial
| API, advising new code to be written with the `Polynomial` API --
| although it doesn't seem it's officially deprecated and no
| warnings are emitted when you use poly1d.
|
| Anyway, there are similar warts throughout the library. Lots of
| gotchas having the shape of silent type conversions and
| inconsistent datatypes returned by the same method depending on
| its inputs that are downright nightmarish to debug
| WD-42 wrote:
| The answer to all these complaints is simple: use APL. Or rather
| these days, BQN.
| CreRecombinase wrote:
| It's kind of wild how much work really smart people will do to
| get python to act like Fortran. This is why R is such a great
| language IMO. Get your data read and arrays in order in dynamic,
| scheme-like language, then just switch to Fortran and write
| actual Fortran like an adult.
| t-kalinowski wrote:
| Or, don't even write the fortran manually, just transpile the R
| function to fortran: https://github.com/t-kalinowski/quickr
| dismalaf wrote:
| This. Writing Fortran is easy as hell nowadays.
|
| But yeah, I learned Fortran to use with R lol. And it is nice.
| Such easy interop.
| _Wintermute wrote:
| R kinda sucks at anything that isn't a dataframe though.
| okanat wrote:
| R sucks for any real programming tasks that you don't
| hardcode every single thing. It sucks at loading modules from
| your own project. It sucks to find the path of the script
| you're executing.
|
| Bacially everything is stateful in R. You call standard
| library functions to install third party libraries ffs. And
| that operation can invoke your C compiler.
|
| Putting R and a repo in a docker container to run it in a
| pipeline where nothing is hardcoded (unlike our datascience
| guy's workspace) was the worst nightmare we had to deal with.
| WCSTombs wrote:
| If your arrays have more than two dimensions, please consider
| using Xarray [1], which adds dimension naming to NumPy arrays.
| Broadcasting and alignment then becomes automatic without needing
| to transpose, add dummy axes, or anything like that. I believe
| that alone solves _most_ of the complaints in the article.
|
| Compared to NumPy, Xarray is a little thin in certain areas like
| linear algebra, but since it's very easy to drop back to NumPy
| from Xarray, what I've done in the past is add little helper
| functions for any specific NumPy stuff I need that isn't already
| included, so I only need to understand the NumPy version of the
| API well enough one time to write that helper function and its
| tests. (To be clear, though, the majority of NumPy ufuncs are
| supported out of the box.)
|
| I'll finish by saying, to contrast with the author, I don't
| dislike NumPy, but I do find its API and data model to be
| insufficient for truly multidimensional data. For me three
| dimensions is the threshold where using Xarray pays off.
|
| [1] https://xarray.dev
| mgunyho wrote:
| Seconded. Xarray has mostly replaced bare NumPy for me and it
| makes me so much more productive.
| ddtaylor wrote:
| Is there anything similar to this for something like
| Tensorflow, Keras or Pytorch? I haven't used them super
| recently, but in the past I needed to do all of the things you
| just described in painful to debug ways.
| mgunyho wrote:
| For Torch, I have come across Named Tensors, which should
| work in a similar way:
| https://docs.pytorch.org/docs/stable/named_tensor.html
|
| The docs say that it's a prototype feature, and I think it
| has been that way for a few years now, so no idea how
| production-ready it is.
| xiphias2 wrote:
| It's a much worse API than Xarrays, it seems like somebody
| should build it on top of PyTorch.
| toth wrote:
| For pytorch the analogue is Named Tensors, but it's a
| provisional feature and not supported everywhere.
|
| https://docs.pytorch.org/docs/stable/named_tensor.html
| SimplyUnknown wrote:
| I really like einops. This works for numpy, pytorch and
| keras/tensorflow and has easy named transpose, repeat, and
| eimsum operations.
| michaelbarton wrote:
| Thanks for sharing this library. I will give it a try.
|
| For a while I had a feeling that I was perhaps a little crazy
| for seeming to be only person to really dislike the use of
| things like 'array[:, :, None]' and so forth.
| insane_dreamer wrote:
| along those lines, for biosignal processing, NeuroPype[0] also
| builds on numpy and implements named axes for n-dimensional
| tensors, with the ability to store per-element data (i.e.
| channel names, positions, etc.) for each axis
|
| [0] https://www.neuropype.io/docs/
| renjimen wrote:
| Xarray is great. It marries the best of Pandas with Numpy.
|
| Indexing like `da.sel(x=some_x).isel(t=-1).mean(["y", "z"])`
| makes code so easy to write and understand.
|
| Broadcasting is never ambiguous because dimension names are
| respected.
|
| It's very good for geospatial data, allowing you to work in
| multiple CRSs with the same underlying data.
|
| We also use it a lot for Bayesian modeling via Arviz [1], since
| it makes the extra dimensions you get from sampling your
| posterior easy to handle.
|
| Finally, you can wrap many arrays into datasets, with common
| coordinates shared across the arrays. This allows you to select
| `ds.isel(t=-1)` across every array that has a time dimension.
|
| [1] https://www.arviz.org/en/latest/
| fsndz wrote:
| xarray is nice
| inasio wrote:
| Life goes full circle sometimes. I remember that Numpy roughly
| came out of the amalgamation of the Numeric and Numarray
| libraries; I want to imagine that the Numarray people kept
| fighting these past 20 years to prove they were the right
| solution, at some point took some money from Elon Musk and
| renamed to Xarray [0], and finally started beating Numpy.
|
| [0] most of the above is fiction
| neonsunset wrote:
| Sighs in F# and Julia
| sundarurfriend wrote:
| I understand the Julia part, but does F# have any particularly
| good facilities or libraries for this kind of (fast) array
| manipulation? It's been getting a good amount of love on HN
| lately, but I don't know if any specific strengths of it
| (beyond the general sense of "functional language with access
| to .NET ecosystem").
| int_19h wrote:
| https://github.com/fsprojects/awesome-
| fsharp/blob/main/READM...
| sundarurfriend wrote:
| I was more curious why they chose F# among the many many
| languages out there, and this kind of list unfortunately is
| not very useful for that - similar lists exist in pretty
| much every awesome-$language page out there. They don't
| give us any idea of how robust or featureful the ecosystem
| and support is, which are the things that would
| differentiate "this too supports it, kinda" (which is every
| major language out there) and "this is really well-
| supported and ergonomic".
| semiinfinitely wrote:
| all issues raised are addressed by jax.vmap
| patrickkidger wrote:
| Went scrolling looking for this! Most of the article is about
| problems solved in JAX.
|
| Also worth noting the Array API standard exists now. This is
| generally also trying to straighten out the sharp edges.
| amensch wrote:
| Same here, beautiful solution
| the__alchemist wrote:
| I'm with the author here. Great for simple use cases. Anything
| beyond that, and I find the syntax inscrutable. It's like using a
| terse, feature-rich DSL like Regex. Good luck deciphering someone
| else's numpy code (or your own from a month back) unless you
| really take the time to commit it to memory.
| aborsy wrote:
| My main problem with numpy is that the syntax is verbose. I know
| from the programming language perspective it may not be
| considered a drawback (might even be a strength). But in
| practice, the syntax is a pain compared to matlab or Julia. The
| code for the latter is easier to read, understand, consistent
| with math notation.
| nis251413 wrote:
| The syntax of actual array languages can be beautifully concise
| and expressive. You can express mathematical formulas in ways
| that makes sense when you read a single line, and once you get
| used to the notation and some originally non-intuitive quirks
| (from a programming background perspective), you can write in
| very few lines what otherwise would take you several lines of
| rather ugly, less readable code.
|
| In my view, python+numpy is not actually an array language.
| Numpy as a library adds vectorization operations to python in
| order to help with speed. This is different. It does not
| (intend to) bring the advantages that array language syntax
| has, even if it was a bit more consistent.
| throwaway314155 wrote:
| Does numpy market itself as an array language or something?
| sunrunner wrote:
| "Young man, in numpy you don't understand things. You just get
| used to them." -- John von Neumann (probably)
| xg15 wrote:
| I'm pretty sure it was Einstein.
| sunrunner wrote:
| Darn, I thought it was Feynman and checked it beforehand.
| Maybe it was Von Neumann quoting Einstein (himself
| quoting...)
| gjm11 wrote:
| I think xg15 was just making a joke about the tendency for
| every quotation that could possibly be assigned to Einstein
| to be assigned to Einstein.
|
| (Perhaps you too are making a joke and it's gone over my
| head, in which case my apologies.)
| JJMcJ wrote:
| It has a von Neumann sound to it.
| bee_rider wrote:
| Numpy is mostly just an interface to BLAS/Lapack, but for Python,
| right? BLAS/Lapack aren't clever libraries for doing a ton of
| small operations, they are simple libraries for doing the easy
| thing (operations on big dense matrices) about as well as the
| hardware can.
|
| Numpy is what it is. Seems more disappointing that he had trouble
| with the more flexible libraries like Jax.
|
| Anyway there's a clear split between the sort of functionality
| that Numpy specializes in and the sort that Jax does, and they
| don't benefit much from stepping on each other's toes, right?
| ChrisRackauckas wrote:
| One of the reasons why I started using Julia was because the
| NumPy syntax was so difficult. Going from MATLAB to NumPy I felt
| like I suddenly became a mediocre programmer, spending less time
| on math and more time on "performance engineering" of just trying
| to figure out how to use NumPy right. Then when I went to Julia
| it made sense to vectorize when it felt good and write a loop
| when it felt good. Because both are fast, focus on what makes the
| code easiest to read an understand. This blog post encapsulates
| exactly that experience and feeling.
|
| Also treating things like `np.linalg.solve` as a black box that
| is the fastest thing in the world and you could never any better
| so please mangle code to call it correctly... that's just wrong.
| There's many reasons to build problem specific linear algebra
| kernels, and that's something that is inaccessible without going
| deeper. But that's a different story.
| Bostonian wrote:
| "Then when I went to Julia it made sense to vectorize when it
| felt good and write a loop when it felt good. Because both are
| fast, focus on what makes the code easiest to read an
| understand."
|
| This is true of modern Fortran also.
| sundarurfriend wrote:
| The amount of work that's been put into Fortran by the
| LFortran [1] folks and others in recent years [2] is nothing
| short of absurdly impressive!
|
| [1] https://lfortran.org/
|
| [2] "in recent years" because that's when I became aware of
| this stuff, not to say there wasn't effort before then
| pklausler wrote:
| If you're going to give praise, include the GNU Fortran
| developers, who have really set the standard for open
| source production quality Fortran compiler development.
| sundarurfriend wrote:
| That sounds great, kudos to them too!
|
| The only times I set foot in the Fortran world is when
| some Julia-related development [1] [2], and then I peruse
| some surrounding links and related projects - so I have
| very little familiarity with GNU Fortran myself.
|
| [1] https://fortran-lang.discourse.group/t/update-on-
| prima-a-jul... [2] https://discourse.julialang.org/t/ann-
| julia-back-end-for-lfo...
| ChrisRackauckas wrote:
| Indeed, I also like modern Fortran.
| jampekka wrote:
| > Going from MATLAB to NumPy I felt like I suddenly became a
| mediocre programmer, spending less time on math and more time
| on "performance engineering" of just trying to figure out how
| to use NumPy right.
|
| Matlab is about as slow without readily vectorized operations
| as Python.
|
| Slowness of Python is a huge pain point, and Julia has a clear
| advantage here. Sadly Julia is practically just unusable beyond
| quite niche purposes.
|
| Python does now have quite serviceable jit hacks that let one
| escape vectorization tricks, but they are still hacks and
| performant Python alternative would be very welcome. Sadly
| there aren't any.
| AnotherGoodName wrote:
| One of the things i suspect will happen very soon is that all
| languages will become as fast as each other and your reason
| to use one over the other for performance reasons might not
| exist. I know this sounds optimistic and wishful thinking but
| hear me out here;
|
| Modern AIs are literal translation engines. That's their
| origin. The context that lets them do what they do was
| originally there to allow good translations. It doesn't
| matter if the translation is programming language output, or
| actual language output. I can today ask an AI to rewrite a
| chunk of Matlab code into Rust and it works! There's even
| code generators that will utilize the GPU where sensible.
|
| We're really not that far off that we can write code in any
| language and transparently behind the scenes have it actually
| running on a more performant backend when needed. Ideally you
| keep the Matlab/Python frontend and the translation will be
| on the intermediate layers in a two way fashion so step
| through/debug works as expected.
|
| Based on playing with the above steps manually with good
| results we're almost at the stage of just needing some glue
| to make it all work. Write in Matlab/Python, and run as fast
| as any LLVM backed language.
| codethief wrote:
| Sounds like transpiling Typescript to JavaScript, except
| that you translate to a vastly different language, thereby
| making you, the programmer, largely unable to reason about
| the performance characteristics of your code ("Can the
| transpiler optimize this Python loop or nah?"), and you
| also throw in the indeterminism of LLMs. Oooof, I'm not
| sure I'd want that.
| AnotherGoodName wrote:
| When you're using Python the type of performance
| characteristics you're usually thinking of are "can i
| call an external library that runs this on the GPU" or
| "is this log(n) or n^2 complexity?" and such language
| translations shouldn't change those parts.
|
| As in the language is regularly benchmarked at 10-100x
| slower than other languages. It's not a languages for
| caring about subtle performance characteristics outside
| "am i calling the external library correctly or not?" (as
| in this article) or "is this algorithm exponential
| time?".
| Zambyte wrote:
| > Sadly Julia is practically just unusable beyond quite niche
| purposes.
|
| Why? Just the ecosystem?
| jampekka wrote:
| Mainly the lack of practical support for non-REPL/Notebook
| usage and interoperability.
| bandrami wrote:
| R
| jampekka wrote:
| R is for sure not a Python replacement. And it's even
| slower than Python.
| abdullahkhalids wrote:
| The reason is quite simple. Julia is a language designed for
| scientific computation. Numpy is a library frankenstein-grafted
| onto a language that isn't really designed for scientific
| computation.
|
| We can only hope that Julia somehow wins and those of forced to
| work in python because of network effects can be freed.
| amluto wrote:
| Is MATLAB materially different? Loops are slow (how slow
| depends on the version), and the fastest thing in the world is
| the utter perfection of the black box called '\'.
| munchler wrote:
| I have to agree with this as someone coming from a strongly-typed
| background (F#). PyTorch and NumPy are very flexible and
| powerful, but their API's are insanely underspecified because
| every function seems to take multiple combinations of vaguely-
| typed objects. The library just figures out the right thing to do
| at runtime using broadcasting or other magic.
|
| This kind of "clever" API seems to be both a benefit and curse of
| the Python ecosystem in general. It makes getting started very
| easy, but also makes mastery maddeningly difficult.
| coolcase wrote:
| Broadcasting is good, but would be nice if it were explicit.
|
| Maybe it being explicit gets in the way and become inelegant
| once you get some experience.
| kccqzy wrote:
| Broadcasting is a useful and powerful feature. It's precisely
| specified and easily learned. However, very few type systems in
| real world languages are powerful enough to express the concept
| of broadcasting in the type system.
| lvl155 wrote:
| I am not a big fan of Python data libraries. They're not cohesive
| in style across the board. Probably why I found R to be a better
| "classroom" solution. Julia is nice and so is Mathematica for
| purely math (and hat tip to Maple).
| a_t48 wrote:
| My issue with it is how easy it is to allocate all over the place
| if you forget to use inplace operations. It's even worse with
| cupy - rather than applying a series of operations to some data
| to produce some other data, you end up producing a set of data
| for each operation. Yes, there are workarounds, but they aren't
| as ergonomic (cupy.fuse() almost does the right thing, cleanly,
| but is a step you have to remember to use, and doesn't really
| work for anything that requires multiple shapes of array).
| sundarurfriend wrote:
| > D = 1/(L _M)_ np.einsum( 'klm,ln,km->kn', A, B, C)
|
| The first time I came across Einsums was via the Tullio.jl
| package, and it seemed like magic to me. I believe the equivalent
| of this would be: @tullio D[k, n] = 1/(L*M) *
| A[k, l, m] * B[l, n] * C[k, m]
|
| which is really close to the mathematical notation.
|
| To my understanding, template strings from PEP 750 will allow for
| something like: D = 1/(L*M) *
| np.einsum(t'{A}[k,l,m] * {B}[l,n] * {C}[k,m]')
|
| right? If so, that'd be pretty neat to have.
| SirHumphrey wrote:
| The main problem (from my point of view) of python data science
| ecosystem is a complete lack of standardization on anything.
|
| You have ten different libraries that try to behave like 4 other
| languages and the only point of standardization is that there is
| usually something like .to_numpy() function. This means that most
| of the time I was not solving any specific problem related to
| data processing, but just converting data from a format one
| library would understand to something another library would
| understand.
|
| In Julia (a language with it's own problems, of course) things
| mostly just work. The library for calculating uncertainties
| interacts well with a library handling units and all this works
| fine with the piloting library, or libraries solving differential
| equations etc. In python, this required quite a lot of
| boilerplate.
| HdS84 wrote:
| R with its 4(?) class systems enters the chat.
| ChrisRackauckas wrote:
| It's at least 5 at this point.
| rienbdj wrote:
| In defense of R the class systems do have different
| characteristics and they are not deeply embedded in the
| language or anything.
| Evidlo wrote:
| Nobody has mentioned array-api (and data-apis more generally),
| which is trying to standardize the way people interact with
| arrays across the Python ecosystem.
|
| https://github.com/data-apis/array-api
|
| https://data-apis.org/blog/announcing_the_consortium/
| bee_rider wrote:
| A dumb thought: technically scipy has a "solve_banded" function
| that does a banded solve. He could easily recast his problem into
| a single big diagonal problem, I guess, just with some silly
| explicit zeros added. I wonder how the performance of that would
| compare, to iterating over a bunch of tiny solves.
|
| Of course, it would be nice if scipy had a block diagonal solver
| (maybe it does?). But yea, I mean, of course it would be nice if
| my problem was always built in functionality of the library, lol.
|
| Maybe a bsr_matrix could work.
| Imnimo wrote:
| The trouble is that I can never tell when the cause of my issues
| is "numpy's syntax/documentation is bad" and when it's "I am very
| bad at thinking about broadcasting and shape arithmetic".
| FrameworkFred wrote:
| I'm actually super-interested to see the next post.
|
| TBH, if you would've asked me yesterday if I'm the sort of person
| who might get sucked in by a cliffhanger story about a numpy
| replacement, I'm pretty sure I would've been an emphatic no.
|
| But I have, in fact, just tried random things in numpy until
| something worked...so, you know...tell me more...
| __mharrison__ wrote:
| Multiple dimensions (more than 2) are hard.
|
| I was at a conference the other day, and my friend (a very smart
| professor) was asking me if it would be possible to move away
| from Xarray to Pandas or Polars...
|
| Perhaps using Numba or Cython (with loops) might make it fast but
| less prone to confusion.
|
| Luckily for me, I mostly stay in tabular data (< 3 dimensions).
| threeducks wrote:
| I thought I'd do something smart and inline all the matrix
| multiplications into the einsums of the vectorized multi-head
| attention implementation from the article and set
| optimize="optimal" to make use of the optimal matrix chain
| multiplication algorithm
| https://en.wikipedia.org/wiki/Matrix_chain_multiplication to get
| a nice performance boost. def
| multi_head_attention_golfed(X, W_q, W_k, W_v, W_o,
| optimize="optimal"): scores =
| np.einsum('si,hij,tm,hmj->hst', X, W_q, X, W_k,
| optimize=optimize) weights =
| softmax(W_k.shape[-1]**-0.5 * scores, axis=-1)
| projected = np.einsum('hst,ti,hiv,hvd->shd', weights, X, W_v,
| W_o, optimize=optimize) return
| projected.reshape(X.shape[0], W_v.shape[2])
|
| This is indeed twice as fast as the vectorized implementation,
| but, disappointingly, the naive implementation with loops is even
| faster. Here is the code if someone wants to figure out why the
| performance is like that: https://pastebin.com/raw/peptFyCw
|
| My guess is that einsum could do a better job of considering
| cache coherency when evaluating the sum.
| jamesblonde wrote:
| In Data for ML, everything has switch from NumPy (Pandas) to
| Arrow (Polars, DuckDB, Spark, Pandas 2.x, etc). However, Scikit-
| Learn is still a hold out, so it's Arrow from you data sources
| all to way to pre-processing pipelines in Scikit-Learn when you
| have to go back to NumPy. In practice, it now makes more sense to
| separate feature pipelines in Arrow from training pipelines with
| Pandas/NumPy and Scikit-Learn.*
|
| *This is ML, not Deep Learning or Transformers.
| ris wrote:
| I would much _much_ prefer the cited "awful" syntax over the
| proposed loop syntax any day. Don't make me run a little virtual
| machine in my head to figure out what the end result of a block
| of code is going to be.
| eurekin wrote:
| That's one area LLMs hugely helped. You can just ask it and ask
| to generate tests to verify
| srean wrote:
| I for one like Numpy's broadcast far better than Matlab's. In
| Numpy it's a logical operation, no unnecessary memory/copying
| cost.
|
| Last time I checked Matlab (that was surely decade ago) it
| actually filled memory with copied data.
| harpiaharpyja wrote:
| I don't use numpy much, but based on the documentation for that
| function and the fact that you can index by None to add another
| dimension, it seems like the correct call is: y =
| linalg.solve(A,x[:,:,None])
|
| The documentation given says that A should be (..., M, M), and x
| should be (..., M, K). So if A is 100x5x5 and x is 100x5, then
| all you need to do is convert x to 100x5x1.
|
| Is that right? It doesn't seem that bad.
| geysersam wrote:
| You're right but if you want `y` to have the same shape as `x`
| I think you also need to slice it after y =
| linalg.solve(A, x[..., None])[..., 0]
|
| I don't really mind numpy syntax much, it gets the job done in
| most scenarios. Numba complements it really well when the code
| is easier to express like a couple of nested loops.
|
| I used Julia for a while, but found it easier to predict the
| performance of python+numpy+numba, Julia has a few footguns and
| the python ecosystem is just insanely polished.
| credit_guy wrote:
| Here's how you do it: focus on the simple case, solve it, then
| ask Copilot to vectorize the code. Life is too short.
| coolcase wrote:
| Sounds like hammers and nails problem. Your nail gun is called
| PyTorch.
| hatthew wrote:
| Am I the only one who feels like this is a rant where the author
| is projecting their unfamiliarity with numpy? Most of the
| examples seem like straw men, and I can't tell if that's because
| they are or if the author genuinely thinks that these are all
| significant problems.
|
| For example, in the linalg.solve problem, based on reading the
| documentation for <60 seconds I had two guesses for what would
| work, and from experimenting it looks like both work equally. If
| you don't want to take the time to understand the documentation
| or experiment to see what works, then just write the for loop.
|
| For indexing problems, how about just don't use weird indexing
| techniques if you don't understand them? I have literally never
| needed to use a list of lists to index an array in my years of
| using numpy. And if you do need to use it for some reason, spend
| 1 minute to test out what happens. "What shape does B have?" is a
| question that can be answered with `print(B.shape)`. If the
| concern is about reading code written by other people, then
| context should make it clear what's happening, and if it's not
| clear then that's the fault of the person who wrote the code for
| not using sensible variable names/comments.
| kccqzy wrote:
| I agree this is a rant. I don't think the author is projecting
| their unfamiliarity here, but they are complaining that numpy
| is too difficult to learn, especially that style where you
| avoid all Python loops.
| fscknumpy wrote:
| I had to make a HN account just to show this numpy "feature".
|
| Create an array like this np.zeros((10, 20, 30)), with shape (10,
| 20, 30).
|
| Clearly if you ask for array[0][:, [0,1]].shape you will get an
| output of shape (20, 2), because you first remove the first
| dimension, and then : broadcasts across the second dimension, and
| you get the first two elements of the third dimension.
|
| Also, obviously we know that in numpy array[a][b][c] ==
| array[a,b,c]. Therefore, what do you think array[0, :,
| [0,1]].shape returns? (20, 2)? Nope. (2, 20).
|
| Why? Nobody knows.
| drhagen wrote:
| > Personally, I think np.einsum is one of the rare parts of NumPy
| that's actually good.
|
| einsum only being able to do multiplication makes it quite
| limited. If we leaned into the Einstein notation (e.g. [1]), we
| could make something that is both quite nice and quite
| performant.
|
| [1] https://tensora.drhagen.com/
| kccqzy wrote:
| I have an unpopular opinion: I don't like numpy.einsum because it
| is too different from the rest of numpy. You label your axes with
| letters but none of the other regular numpy functions do that. I
| usually avoid using numpy.einsum in favor of using a combination
| of indexing notation with numpy.newaxis (None), broadcasting, and
| numpy.swapaxes.
|
| And I learned from someone more senior than me that you should
| instead label your variables with single-letter axes names. This
| way, the reader reads regular non-einsum operations and they
| still have the axes information in their mental model. And when
| you use numpy.einsum these axes labeling information become
| duplicated.
| zdevito wrote:
| I tried to do something similar with 'first-class' dimension
| objects in PyTorch
| https://github.com/pytorch/pytorch/blob/main/functorch/dim/R... .
| For instance multi-head attention looks like:
| from torchdim import softmax def multiheadattention(q, k,
| v, num_attention_heads, dropout_prob, use_positional_embedding):
| batch, query_sequence, key_sequence, heads, features = dims(5)
| heads.size = num_attention_heads # binding
| dimensions, and unflattening the heads from the feature dimension
| q = q[batch, query_sequence, [heads, features]] k =
| k[batch, key_sequence, [heads, features]] v =
| v[batch, key_sequence, [heads, features]] #
| einsum-style operators to calculate scores,
| attention_scores = (q*k).sum(features) * (features.size ** -0.5)
| # use first-class dim to specify dimension for softmax
| attention_probs = softmax(attention_scores, dim=key_sequence)
| # dropout work pointwise, following Rule #1
| attention_probs = torch.nn.functional.dropout(attention_probs,
| p=dropout_prob) # another matrix product
| context_layer = (attention_probs*v).sum(key_sequence)
| # flatten heads back into features return
| context_layer.order(batch, query_sequence, [heads, features])
|
| However, my impression trying to get a wider userbase is that
| while numpy-style APIs maybe are not as good as some better array
| language, they might not be the bottleneck for getting things
| done in PyTorch. However, other domains might suffer more, and I
| am very excited to see a better array language catch on.
| cycomanic wrote:
| I sort of agree with the author that N>3 dimensional arrays are
| cumbersome in numpy, that said I think this is partly because we
| are really not that great thinking in higher dimensions. I'm
| interested what the authors solution to the problem is, but
| unlike the author I'm not a big fan of the eigen notation, but
| maybe I just don't use it often enough. I don't see the issue
| with a[:,:,None] notation and that's never given me issues,
| however I agree about the issue with index arrays. I often make
| something which think should work and then need to go back to the
| documentation to realise no that's not how it works.
|
| The inconsistency for argument naming is also annoying (even more
| if we include scipy), e.g. why is it np.fft.fft(x, axis=1) but
| np.fft.fftshift(x, axes=1)?!
| josefrichter wrote:
| You could try Elixir's Nx and related ecosystem.
| janalsncm wrote:
| I would like to be able to turn off implicit broadcasting
| entirely. It has caused me so many evil bugs (PyTorch, but the
| same thing applies to Numpy).
|
| Imagine all of the weirdness of js "truthiness" bugs (with silent
| unexpected behavior) combined with the inherent difficulty of
| debugging matrix multiplications in 2, 3, or 4 dimensions. I
| would rather herd goats in a Lumon basement.
|
| Jax can do it but I'm not willing to migrate everything.
| frollogaston wrote:
| About optimization, I feel like NumPy is meant to be feature-
| rich, convenient, and decently fast, not the fastest thing
| possible. There are semi drop in replacements that use more CPU
| parallelism or GPU, but they don't cover everything. Just wish it
| were clearer _which_ NumPy build I 'm installing, cause
| apparently `pip3 install numpy` on my Mac gave me something built
| with the worst flags possible.
|
| About >2 dimensions, I always found this confusing in NumPy but
| just chalked it up to >2 dim arrays being inherently confusing.
| Maybe there really is a better way.
___________________________________________________________________
(page generated 2025-05-15 23:00 UTC)