[HN Gopher] I don't like NumPy
       ___________________________________________________________________
        
       I don't like NumPy
        
       Author : MinimalAction
       Score  : 300 points
       Date   : 2025-05-15 16:05 UTC (6 hours ago)
        
 (HTM) web link (dynomight.net)
 (TXT) w3m dump (dynomight.net)
        
       | dima55 wrote:
       | Hear hear! Some of these complaints have been resolved with
       | numpysane: https://github.com/dkogan/numpysane/ . With numpysane
       | and gnuplotlib, I now find numpy acceptable and use it heavily
       | for everything. But yeah; without these it's unusable.
        
         | alanbernstein wrote:
         | Thanks for the link, I have also grumbled about these issues,
         | but never thought to look for a workaround library on top of
         | numpy...
        
       | blululu wrote:
       | The author brings up some fair points. I feel like I had all
       | sorts of small grievances transitioning from Matlab to Numpy.
       | Slicing data still feels worse on Numpy than Matlab or Julia, but
       | this doesn't justify the licensing costs for the Matlab
       | stats/signal processing toolbox.
       | 
       | The issues that are presented in this article mostly related to
       | tensors of rank >2. Numpy is originally just matrices so it is
       | not surprising that it has problems in this domain. A dedicated
       | library like Torch is certainly better. But Torch is difficult in
       | its own ways. IDK, I guess the author's conclusion that numpy is
       | "the worst array language other than all the other array
       | languages" feels correct. Maybe a lack of imagination on my part.
        
         | jampekka wrote:
         | Numpy was about N-dimensional arrays from the getgo. It was a
         | continuation of numarray, which was Nd.
        
       | theLiminator wrote:
       | Anyone use xarray? Curious how it compares?
        
         | apetrov wrote:
         | my experience was positive in context of pymc and arviz (as a
         | way to represent posterior and posterior predictive
         | distributions). API is sane and quite easy to get around.
        
         | xg15 wrote:
         | Looks great, but doesn't support pytorch yet, only numpy, or
         | does it?
         | 
         | https://github.com/pydata/xarray/issues/3232
        
       | rrr_oh_man wrote:
       | Coming from the excellent tidyverse or even data.table in R,
       | Numpy always has felt like twenty steps back into mindfuck
       | territory.
        
         | acc_297 wrote:
         | There's also the fantastic "tidytable" library. I wouldn't want
         | to implement multi-headed self attention in either of those
         | libraries though.
         | 
         | I've done only very simple ML stuff using R and it never feels
         | like exactly the right tool.
        
           | frakt0x90 wrote:
           | I had to write a normal codebase in R exactly one time and it
           | was one of the weirdest and most unpleasant coding
           | experiences I've had. After that, I decided R _is_ tidyverse
           | and a handful of stats libraries. If you need more, reach for
           | another tool.
        
             | lottin wrote:
             | I never understood the appeal of tidyverse. I have a
             | colleague who, like you, thinks that R is tidyverse. He
             | also has the nasty habit of starting Excel formulas with a
             | '+' sign.
        
               | jampekka wrote:
               | The main attraction of tidyverse is that it's easy to
               | copy-paste code for common cases. If there's no ready
               | recipe for something, it's usually just not done, or some
               | "technical person" is called to do it.
               | 
               | R is used mostly as a fancy calculator, which is fine in
               | itself, but it makes the comparisons to general purpose
               | languages like Python quite apples-to-oranges.
        
         | jampekka wrote:
         | Numpy and tidyverse/data.table are not really on the same level
         | of abstraction. Something like Pandas would be (and it
         | definitely has its warts).
         | 
         | Doing the lower level stuff that NumPy is used for is really
         | mindfuck territory in R.
        
       | zahlman wrote:
       | TFA keeps repeating "you can't use loops", but aren't they, like,
       | _merely_ less performant? I understand that there are going to be
       | people out there doing complex algorithms (perhaps part of an ML
       | system) where that performance is crucial and you might as well
       | not be using NumPy in the first place if you skip any
       | opportunities to do things in The Clever NumPy Way. But say I 'm
       | just, like, processing a video frame by frame, by using TCNW on
       | each frame and iterating over the time dimension; surely that
       | won't matter?
       | 
       | Also: TIL you can apparently use _multi-dimensional_ NumPy arrays
       | as NumPy array indexers, and they don 't just collapse into
       | 1-dimensional iterables. I expected `A[:,i,j,:]` not to work, or
       | to be the same as if `j` were just `(0, 1)`. But instead, it
       | apparently causes transposition with the previous dimension... ?
        
         | nyeah wrote:
         | Right, you can use loops. But then it goes much slower than a
         | GPU permits.
        
           | zahlman wrote:
           | My point is that plenty of people use NumPy for reasons that
           | have nothing to do with a GPU.
        
             | crazygringo wrote:
             | The whole point of NumPy is to make things much, much
             | faster than interpreted Python, whether you're GPU-
             | accelerated or not.
             | 
             | Even code you write now, you may need to GPU accelerate
             | later, as your simulations grow.
             | 
             | Falling back on loops is against the entire reason of using
             | NumPy in the first place.
        
               | nyeah wrote:
               | I really disagree. That's not the only point of NumPy. A
               | lot of people use it like Matlab, to answer questions
               | with minimal coding time, not minimal runtime.
        
               | crazygringo wrote:
               | I mean sure, the fact that it is performant means tons of
               | functionality is built on it that is hard to find
               | elsewhere.
               | 
               | But the point is still that the main purpose in building
               | it was to be performant. To be accelerated. Even if
               | that's not why you're personally using it.
               | 
               | I mean, I use my M4 Mac's Spotlight to do simple
               | arithmetic. That's not the main point in building the M4
               | chip though.
        
               | nyeah wrote:
               | As best I can see, you weren't originally talking about
               | the reason for creating NumPy. Instead you seemed to be
               | talking about the reason for using NumPy.
        
             | nyeah wrote:
             | I mean yes. Also in your example where you hardly spend any
             | time running Python code, the performance difference likely
             | wouldn't matter.
        
           | cycomanic wrote:
           | But once you need to use the GPU you need to go to another
           | framework anyway (e.g. jax, tensorflow, arrayfire, numba...).
           | AFAIK many of those can parallise loops using their jit
           | functionality (in fact, e.g. numbas jit for a long time could
           | not deal with numpy broadcasing, so you had to write out your
           | loops). So you're not really running into a problem?
        
         | yongjik wrote:
         | "merely less performant" is severely underselling it. It could
         | easily add a few zeros to your execution time.
         | 
         | (And that's before you even consider GPUs.)
        
         | KeplerBoy wrote:
         | It's a slippery slope. Sometimes a python loop outside some
         | numpy logic is fine but it's insane how much perf you can leave
         | on the table if you overdo it.
         | 
         | It's not just python adding interpretor overhead, you also risk
         | creating a lot of temporary arrays i.e. costly mallocs and
         | memcopies.
        
         | jerf wrote:
         | You can draw out a sort of performance hierachy, from fastest
         | to slowest:                   * Optimized GPU code         *
         | CPU vectorized code         * Static CPU unvectorized code
         | * Dynamic CPU code
         | 
         | where the last one refers to the fact that a language like
         | Python, in order to add two numbers together in its native,
         | pure-Python mode, does a lot of boxing, unboxing, resolving of
         | class types and checking for overrides, etc.
         | 
         | Each of those is _at least_ an order of magnitude slower than
         | the next one up the hierarchy, and most of them appreciably
         | more than one. You 're probably closer to think of them as more
         | like 1.5 orders of magnitude as a sort of back-of-the-envelope
         | understanding.
         | 
         | Using NumPy incorrectly can accidentally take you from the top
         | one, all the way to the bottom one, in one fell swoop. That can
         | be a big deal, real quick. Or real slow, as the case may be.
         | 
         | In more complicated scenarios, it matters how much computation
         | is going how far down that hierarchy. If by "processing a video
         | frame by frame" you mean something like "I wrote a for loop on
         | the frames but all the math is still in NumPy", you've taken
         | "iterating on frames" from the top to the bottom, but who
         | cares, Python can iterate on even a million things plenty
         | quickly, especially with everything else that is going on. If,
         | by constrast, you mean that at some point you're iterating over
         | each pixel in pure Python, you just fell all the way down that
         | hierarchy for each _pixel_ and you 're in bigger trouble.
         | 
         | In my opinionated opinion, the trouble isn't so much that it's
         | possible to fall down that stack. That is arguably a feature,
         | after all; surely we should have the capability of doing that
         | sort of thing if we want. The problem is how easy it is to do
         | without realizing it, just by using Python in what looks like
         | perfectly sensible ways. If you aren't a systems engineer it
         | can be hard to tell you've fallen, and even if you are honestly
         | the docs don't make it particularly easy to figure out.
        
           | HdS84 wrote:
           | As a simple example : once upon a time a We needed to
           | generate a sort of heat map. Doing it in pure python takes a
           | few seconds at the desired size (few thousand cells where
           | each cell needs a small formula). Dropping to numpy braucht
           | that downs to hundreds of milliseconds. Pushing it to pure c
           | got us to tens of milliseconds.
        
             | DrFalkyn wrote:
             | Yeah one of other beauties of numpy is you can pass data
             | to/from native shared libraries compiled from C code with
             | little overhead. This was more klidgy in Matlab last I
             | checked
        
           | maximilianroos wrote:
           | that's a great hierarchy!
           | 
           | though what does "static cpu" vs "dynamic cpu" mean? it's one
           | thing to be pointer chasing and missing the cache like OCaml
           | can, it's another to be running a full interpreter loop to
           | add two numbers like python does
        
           | coolcase wrote:
           | Plus it isn't a checkbox on a UI where Electon being 1000
           | times slower (1ms instead of 1micro) would be noticeable.
           | 
           | It could be a 12 hour run vs. 12000000 hour run.
        
       | complex_pi wrote:
       | NumPy allows a lot of science to happen. Grievance is fine but a
       | little respect is as well.
       | 
       | NumPy is the lingua franca for storing and passing arrays in
       | memory in Python.
       | 
       | Thank you NumPy!
        
         | tptacek wrote:
         | I get the impression the author is pretty familiar with what
         | NumPy is.
        
           | jampekka wrote:
           | And given the ending of TFA the author would likely agree
           | with GP.
        
         | rekenaut wrote:
         | This is great aspect of it, but that doesn't diminish that it
         | feels so tedious to work with compared to Julia and Matlab.
         | Some of that is just from trying to shoehorn Python as a
         | scientific computing language, but I think it's time to revisit
         | whether Python should have first party support for
         | vectorization, arrays in memory, and broadcasting as Julia and
         | Matlab have.
        
           | complex_pi wrote:
           | NumPy allowed a PEP for the in-memory representation of
           | arrays in Python. This is tremendous useful for making APIs.
        
           | jampekka wrote:
           | I've never understood the "so tedious" argument of Python vs
           | Matlab. Sure, having to import numpy and use np.array to
           | create numpy arrays is a bit typing, but other than that I
           | don't see major differences.
           | 
           | Given what inconsistent mess Matlab is aside from matrix
           | manipulation, I take NumPy any day.
        
       | nayuki wrote:
       | I skimmed the article and agree with it at a high level, though I
       | haven't faced most of those problems personally.
       | 
       | I have several gripes with NumPy, or more broadly the notion of
       | using Python to call a C/asm library that vectorizes math
       | operations. A lot of people speak of NumPy like the solution to
       | all your high-performance math needs in Python, which I think is
       | disingenuous. The more custom logic you do, the less suitable the
       | tools get. Pure Python numeric code is incredibly slow - like
       | 1/30x compared to Java - and as you find parts that can't be
       | vectorized, you have to drop back down to pure Python.
       | 
       | I would like to give the simple example of the sieve of
       | Eratosthenes:                 def sieve_primeness(limit):
       | result = [False] + [True] * limit           for i in range(2,
       | len(result)):               if result[i]:                   for j
       | in range(i * i, len(result), i):                       result[j]
       | = False
       | 
       | This code is scalar, and porting it to a language like
       | C/C++/Rust/Java gives decent performance straight out of the box.
       | Performance in Python is about 1/30x the speed, which is not
       | acceptable.
       | 
       | People pretend that you can hand-wave the performance problem
       | away by using NumPy. Please tell me how to vectorize this Python
       | code. Go on, I'm waiting.
       | 
       | You can't vectorize the `if result[i]` because that controls
       | whether the inner for-loop executes; so it must execute in pure
       | Python. For the inner loop, you can vectorize that by creating a
       | huge temporary array and then AND it with the result array, but
       | that is extremely memory-inefficient compared to flipping bits of
       | the result array in place, and probably messes up the cache too.
       | 
       | Alternatively, you can run the code in PyPy, but that usually
       | gives a speed-up of maybe 3x, which is nowhere near enough to
       | recover the 30x speed penalty.
       | 
       | Another example is that NumPy is not going to help you implement
       | your own bigint library, because that also requires a lot of
       | custom logic that executes between loop iterations. Meanwhile,
       | I've implemented bigint in pure Java with acceptable performance
       | and without issues.
       | 
       | Again, my point is that anytime you venture into custom
       | numerical/logical code that is not the simple `C = A + B`, you
       | enter a world of pain when it comes to Python performance or
       | vectorization.
        
         | brosco wrote:
         | Have you heard of JIT libraries like numba
         | (https://github.com/numba/numba)? It doesn't work for all
         | python code, but can be helpful for the type of function you
         | gave as an example. There's no need to rewrite anything, just
         | add a decorator to the function. I don't really know how
         | performance compares to C, for example.
        
         | Jtsummers wrote:
         | I'm not a numpy expert, I just use it on occasion but this
         | works and eliminates the explicit inner loop, but not the outer
         | loop. It collects the list of prime numbers while removing
         | multiples of the prime from the numpy array of numbers:
         | 
         | import numpy as np                 def sieve_primes(limit):
         | nums = np.arange(limit + 1)         nums[1] = 0         primes
         | = []         for i in range(2, limit):           if nums[i] !=
         | 0: # could just be `if nums[i]:` since 0 is false-y
         | primes.append(i)             nums = nums * (np.mod(nums, i) !=
         | 0)         print(primes)       sieve_primes(1000)
         | 
         | This takes advantage of the fact that True and False are
         | treated as 1 and 0 in Python for the multiplication.
         | 
         | EDIT: The other solutions people have shown are closer to the
         | sieve itself than my solution. I was having trouble with the
         | slicing notation, now that I've seen how that should be done
         | I'd redo the above as:                 def sieve_primes(limit):
         | nums = np.arange(limit + 1)         nums[1] = 0         for i
         | in range(2, limit):           if nums[i] != 0:
         | nums[i+i::i] = 0         print(np.nonzero(nums))
         | 
         | However, the first version is faster by my testing so I'd end
         | up just going back to it if I really cared about performance.
        
         | xg15 wrote:
         | > _Please tell me how to vectorize this Python code. Go on, I
         | 'm waiting._
         | 
         | You can't entirely get rid of the loops, but at least the inner
         | one can be vectorized, I think. That would reduce the "pythonic
         | runtime" from squared to linear.
         | 
         | Maybe something like this?                 result =
         | np.full((limit+1,), True, dtype=bool)       result[0] = False
         | indices = np.arange(len(result))       for i in range(2,
         | len(result)):         result[indices[i:] * i] = False
         | 
         | This is just out of my head, so may contain bugs. Also, the
         | indices array may need to be clamped, not sure right now if
         | numpy accepts out-of-bounds indices.
        
         | nonameiguess wrote:
         | Ironically enough, I wrote a Sieve in NumPy 12 years ago or so
         | in grad school messing around with Project Euler. I haven't
         | touched it since and have not regularly used either NumPy or
         | even Python in quite some time, but I still have the solutions
         | in a private Github repo. The code is:                 def
         | primes(n):           """Input: A natural number n.
         | Output: An array of primes, p < n."""           assert n >= 2
         | sieve = np.ones(n / 2, dtype=np.bool)           for i in
         | range(3, int(n ** 0.5) + 1, 2):               if sieve[i / 2]:
         | sieve[i * i / 2::i] = False           return np.r_[2, 2 *
         | np.nonzero(sieve)[0][1::] + 1]
         | 
         | It still relies on looping and I have no idea if you can fully
         | vectorize everything, but it does at least get quite a bit of
         | speedup thanks to the ability to index one array using another,
         | so I can avoid the inner loop you have to use in pure Python.
         | 
         | I have absolutely no clue what that final line is doing and I'm
         | not sure I understood it 12 years ago, either.
        
           | gjm11 wrote:
           | sieve[i] is nonzero precisely when 2i+1 is an odd prime _or_
           | i=0 (because it 's initialized to all-1 and nothing ever
           | zeros the first element).
           | 
           | So np.nonzero(sieve) is all the indices for which i=0 or 2i+1
           | is an odd prime.
           | 
           | Except that the behaviour of np.nonzero is a bit weird: it
           | returns a tuple of arrays, one for each axis. (So, e.g., if
           | you feed it a 2d array it gives you (a,b) such that the
           | nonzero elements are at positions (a[0],b[0]), (a[1],b[1]),
           | etc.) That's kinda reasonable behaviour, but it means that
           | for the simple case of a 1-d array you get a 1-element tuple.
           | 
           | So np.nonzero(sieve)[0] is _actually_ all the indices i for
           | which i=0 or 2i+1 is an odd prime, and now you know why I
           | didn 't write "all the indices i for which ..." above.
           | 
           | The case i=0 would correspond to 1 being prime, which it
           | isn't considered to be. So we throw that one out.
           | 
           | So np.nonzero(sieve)[0][1::] is all the indices i for which
           | 2i+1 is an odd prime.
           | 
           | And so 2*np.nonzero(sieve)[0][1::]+1 is all the 2i+1 for
           | which 2i+1 is an odd prime; that is, all the odd primes.
           | 
           | I don't think I've encountered np.r_ before, but it's pretty
           | obvious what it's doing: prepending 2 to that array, so now
           | it's "2 followed by all the odd primes" or, more simply, all
           | the primes.
           | 
           | (Obviously, in the above phrases like "all the primes" mean
           | "all the primes up to the limit defined by the size of the
           | array".)
        
         | WCSTombs wrote:
         | > Please tell me how to vectorize this Python code. Go on, I'm
         | waiting.
         | 
         | Here's a start:                   import numpy as np
         | def sieve(limit):             result = np.ones(limit + 2,
         | dtype=bool)             result[0] = False             result[1]
         | = False             for i in range(2, limit + 1):
         | if result[i]:                     yield i
         | result[::i] = False              for prime in sieve(100):
         | print(prime)
         | 
         | As you said, you can't vectorize the outer loop because each
         | iteration depends on the result of previous iterations, so I
         | think you can't do too much better than that with this
         | algorithm. (There are a few easy algorithm optimizations, but I
         | think that's kind of orthogonal to your point, and anyway you
         | can do them in any language.)
         | 
         | I would expect there's still a significant performance gap
         | between that and, say, a simple C implementation, but that
         | shouldn't be surprising, and personally I haven't encountered
         | these supposed droves of people claiming that NumPy fully
         | solves the performance gap. From what I've seen there was a
         | general consensus that vectorizing with NumPy can significantly
         | tighten the gap but can rarely fully close it.
        
         | __mharrison__ wrote:
         | See Numba or Cython...
        
       | DrFalkyn wrote:
       | Back when our lab transitioned from Matlab to Python I used
       | numpy/scipy quite a bit. I remember heavily leaning on
       | numpy.reshape to get things to work correctly. In some cases I
       | did resort to looping.
        
       | brosco wrote:
       | Compared to Matlab (and to some extent Julia), my complaints
       | about numpy are summed up in these two paragraphs:
       | 
       | > Some functions have axes arguments. Some have different
       | versions with different names. Some have Conventions. Some have
       | Conventions and axes arguments. And some don't provide any
       | vectorized version at all.
       | 
       | > But the biggest flaw of NumPy is this: Say you create a
       | function that solves some problem with arrays of some given
       | shape. Now, how do you apply it to particular dimensions of some
       | larger arrays? The answer is: You re-write your function from
       | scratch in a much more complex way. The basic principle of
       | programming is abstraction--solving simple problems and then
       | using the solutions as building blocks for more complex problems.
       | NumPy doesn't let you do that.
       | 
       | Usually when I write Matlab code, the vectorized version just
       | works, and if there are any changes needed, they're pretty minor
       | and intuitive. With numpy I feel like I have to look up the
       | documentation for every single function, transposing and
       | reshaping the array into whatever shape that particular function
       | expects. It's not very consistent.
        
         | breakds wrote:
         | For the second problem, if I understand it correctly, you might
         | want to try `vmap` from jax.
        
         | jampekka wrote:
         | Matlab's support for more than 2 dimensions in arrays is so bad
         | that it's rare to encounter the situations lamented in TFA.
        
           | IshKebab wrote:
           | Well it's not Tenslab!
        
           | treefarmer wrote:
           | Yeah, in case anyone else has the misfortune of having to
           | work with multi-dimensional data in MATLAB, I'd recommend the
           | Tensor Toolbox, Tensorlab, or the N-way Toolbox.
        
       | phronimos wrote:
       | Numba is a great option for speeding up (vectorizing) loops and
       | NumPy code, apart from CuPy and JAX. Xarray is also worth trying
       | for tensors beyond 2 dimensions.
        
         | KeplerBoy wrote:
         | true, a nice jit compiler solves a lot of the problems
         | mentioned in the article. These days i often use jax.jit for
         | the gpu support and numpy like syntax with the added benefit of
         | fast loop constructs.
        
       | vector_spaces wrote:
       | My main issue with numpy is that it's often unclear what
       | operations will be vectorized or how they will be vectorized, and
       | you can't be explicit about it the way you can with Julia's dot
       | syntax.
       | 
       | There are also lots of gotchas related to types returned by
       | various functions and operations.
       | 
       | A particularly egregious example: for a long time, the class for
       | univariate polynomial objects was np.poly1d. It had lots of
       | conveniences for doing usual operations on polynomials
       | 
       | If you multiply a poly1d object P on the right by a complex
       | scalar z0, you get what you probably expect: a poly1d with
       | coefficients scaled by z0.
       | 
       | If however you multiply P on the left by z0, you get back an
       | array of scaled coefficients -- there's a silent type conversion
       | happening.
       | 
       | So                 P*z0 # gives a poly1d       z0*P # gives an
       | array
       | 
       | I know that this is due to Python associativity rules and
       | laissez-faire approach to datatypes, but it's fairly ugly to
       | debug something like this!
       | 
       | Another fun gotcha with poly1d: if you want to access the leading
       | coefficient for a quadratic, you can do so with either P.coef[0]
       | or P[2]. No one will ever get these confused, right?
       | 
       | These particular examples aren't really fair because the numpy
       | documentation describes poly1d as a part of the "old" polynomial
       | API, advising new code to be written with the `Polynomial` API --
       | although it doesn't seem it's officially deprecated and no
       | warnings are emitted when you use poly1d.
       | 
       | Anyway, there are similar warts throughout the library. Lots of
       | gotchas having the shape of silent type conversions and
       | inconsistent datatypes returned by the same method depending on
       | its inputs that are downright nightmarish to debug
        
       | WD-42 wrote:
       | The answer to all these complaints is simple: use APL. Or rather
       | these days, BQN.
        
       | CreRecombinase wrote:
       | It's kind of wild how much work really smart people will do to
       | get python to act like Fortran. This is why R is such a great
       | language IMO. Get your data read and arrays in order in dynamic,
       | scheme-like language, then just switch to Fortran and write
       | actual Fortran like an adult.
        
         | t-kalinowski wrote:
         | Or, don't even write the fortran manually, just transpile the R
         | function to fortran: https://github.com/t-kalinowski/quickr
        
         | dismalaf wrote:
         | This. Writing Fortran is easy as hell nowadays.
         | 
         | But yeah, I learned Fortran to use with R lol. And it is nice.
         | Such easy interop.
        
         | _Wintermute wrote:
         | R kinda sucks at anything that isn't a dataframe though.
        
           | okanat wrote:
           | R sucks for any real programming tasks that you don't
           | hardcode every single thing. It sucks at loading modules from
           | your own project. It sucks to find the path of the script
           | you're executing.
           | 
           | Bacially everything is stateful in R. You call standard
           | library functions to install third party libraries ffs. And
           | that operation can invoke your C compiler.
           | 
           | Putting R and a repo in a docker container to run it in a
           | pipeline where nothing is hardcoded (unlike our datascience
           | guy's workspace) was the worst nightmare we had to deal with.
        
       | WCSTombs wrote:
       | If your arrays have more than two dimensions, please consider
       | using Xarray [1], which adds dimension naming to NumPy arrays.
       | Broadcasting and alignment then becomes automatic without needing
       | to transpose, add dummy axes, or anything like that. I believe
       | that alone solves _most_ of the complaints in the article.
       | 
       | Compared to NumPy, Xarray is a little thin in certain areas like
       | linear algebra, but since it's very easy to drop back to NumPy
       | from Xarray, what I've done in the past is add little helper
       | functions for any specific NumPy stuff I need that isn't already
       | included, so I only need to understand the NumPy version of the
       | API well enough one time to write that helper function and its
       | tests. (To be clear, though, the majority of NumPy ufuncs are
       | supported out of the box.)
       | 
       | I'll finish by saying, to contrast with the author, I don't
       | dislike NumPy, but I do find its API and data model to be
       | insufficient for truly multidimensional data. For me three
       | dimensions is the threshold where using Xarray pays off.
       | 
       | [1] https://xarray.dev
        
         | mgunyho wrote:
         | Seconded. Xarray has mostly replaced bare NumPy for me and it
         | makes me so much more productive.
        
         | ddtaylor wrote:
         | Is there anything similar to this for something like
         | Tensorflow, Keras or Pytorch? I haven't used them super
         | recently, but in the past I needed to do all of the things you
         | just described in painful to debug ways.
        
           | mgunyho wrote:
           | For Torch, I have come across Named Tensors, which should
           | work in a similar way:
           | https://docs.pytorch.org/docs/stable/named_tensor.html
           | 
           | The docs say that it's a prototype feature, and I think it
           | has been that way for a few years now, so no idea how
           | production-ready it is.
        
             | xiphias2 wrote:
             | It's a much worse API than Xarrays, it seems like somebody
             | should build it on top of PyTorch.
        
           | toth wrote:
           | For pytorch the analogue is Named Tensors, but it's a
           | provisional feature and not supported everywhere.
           | 
           | https://docs.pytorch.org/docs/stable/named_tensor.html
        
           | SimplyUnknown wrote:
           | I really like einops. This works for numpy, pytorch and
           | keras/tensorflow and has easy named transpose, repeat, and
           | eimsum operations.
        
         | michaelbarton wrote:
         | Thanks for sharing this library. I will give it a try.
         | 
         | For a while I had a feeling that I was perhaps a little crazy
         | for seeming to be only person to really dislike the use of
         | things like 'array[:, :, None]' and so forth.
        
         | insane_dreamer wrote:
         | along those lines, for biosignal processing, NeuroPype[0] also
         | builds on numpy and implements named axes for n-dimensional
         | tensors, with the ability to store per-element data (i.e.
         | channel names, positions, etc.) for each axis
         | 
         | [0] https://www.neuropype.io/docs/
        
         | renjimen wrote:
         | Xarray is great. It marries the best of Pandas with Numpy.
         | 
         | Indexing like `da.sel(x=some_x).isel(t=-1).mean(["y", "z"])`
         | makes code so easy to write and understand.
         | 
         | Broadcasting is never ambiguous because dimension names are
         | respected.
         | 
         | It's very good for geospatial data, allowing you to work in
         | multiple CRSs with the same underlying data.
         | 
         | We also use it a lot for Bayesian modeling via Arviz [1], since
         | it makes the extra dimensions you get from sampling your
         | posterior easy to handle.
         | 
         | Finally, you can wrap many arrays into datasets, with common
         | coordinates shared across the arrays. This allows you to select
         | `ds.isel(t=-1)` across every array that has a time dimension.
         | 
         | [1] https://www.arviz.org/en/latest/
        
         | fsndz wrote:
         | xarray is nice
        
         | inasio wrote:
         | Life goes full circle sometimes. I remember that Numpy roughly
         | came out of the amalgamation of the Numeric and Numarray
         | libraries; I want to imagine that the Numarray people kept
         | fighting these past 20 years to prove they were the right
         | solution, at some point took some money from Elon Musk and
         | renamed to Xarray [0], and finally started beating Numpy.
         | 
         | [0] most of the above is fiction
        
       | neonsunset wrote:
       | Sighs in F# and Julia
        
         | sundarurfriend wrote:
         | I understand the Julia part, but does F# have any particularly
         | good facilities or libraries for this kind of (fast) array
         | manipulation? It's been getting a good amount of love on HN
         | lately, but I don't know if any specific strengths of it
         | (beyond the general sense of "functional language with access
         | to .NET ecosystem").
        
           | int_19h wrote:
           | https://github.com/fsprojects/awesome-
           | fsharp/blob/main/READM...
        
             | sundarurfriend wrote:
             | I was more curious why they chose F# among the many many
             | languages out there, and this kind of list unfortunately is
             | not very useful for that - similar lists exist in pretty
             | much every awesome-$language page out there. They don't
             | give us any idea of how robust or featureful the ecosystem
             | and support is, which are the things that would
             | differentiate "this too supports it, kinda" (which is every
             | major language out there) and "this is really well-
             | supported and ergonomic".
        
       | semiinfinitely wrote:
       | all issues raised are addressed by jax.vmap
        
         | patrickkidger wrote:
         | Went scrolling looking for this! Most of the article is about
         | problems solved in JAX.
         | 
         | Also worth noting the Array API standard exists now. This is
         | generally also trying to straighten out the sharp edges.
        
           | amensch wrote:
           | Same here, beautiful solution
        
       | the__alchemist wrote:
       | I'm with the author here. Great for simple use cases. Anything
       | beyond that, and I find the syntax inscrutable. It's like using a
       | terse, feature-rich DSL like Regex. Good luck deciphering someone
       | else's numpy code (or your own from a month back) unless you
       | really take the time to commit it to memory.
        
       | aborsy wrote:
       | My main problem with numpy is that the syntax is verbose. I know
       | from the programming language perspective it may not be
       | considered a drawback (might even be a strength). But in
       | practice, the syntax is a pain compared to matlab or Julia. The
       | code for the latter is easier to read, understand, consistent
       | with math notation.
        
         | nis251413 wrote:
         | The syntax of actual array languages can be beautifully concise
         | and expressive. You can express mathematical formulas in ways
         | that makes sense when you read a single line, and once you get
         | used to the notation and some originally non-intuitive quirks
         | (from a programming background perspective), you can write in
         | very few lines what otherwise would take you several lines of
         | rather ugly, less readable code.
         | 
         | In my view, python+numpy is not actually an array language.
         | Numpy as a library adds vectorization operations to python in
         | order to help with speed. This is different. It does not
         | (intend to) bring the advantages that array language syntax
         | has, even if it was a bit more consistent.
        
           | throwaway314155 wrote:
           | Does numpy market itself as an array language or something?
        
       | sunrunner wrote:
       | "Young man, in numpy you don't understand things. You just get
       | used to them." -- John von Neumann (probably)
        
         | xg15 wrote:
         | I'm pretty sure it was Einstein.
        
           | sunrunner wrote:
           | Darn, I thought it was Feynman and checked it beforehand.
           | Maybe it was Von Neumann quoting Einstein (himself
           | quoting...)
        
             | gjm11 wrote:
             | I think xg15 was just making a joke about the tendency for
             | every quotation that could possibly be assigned to Einstein
             | to be assigned to Einstein.
             | 
             | (Perhaps you too are making a joke and it's gone over my
             | head, in which case my apologies.)
        
           | JJMcJ wrote:
           | It has a von Neumann sound to it.
        
       | bee_rider wrote:
       | Numpy is mostly just an interface to BLAS/Lapack, but for Python,
       | right? BLAS/Lapack aren't clever libraries for doing a ton of
       | small operations, they are simple libraries for doing the easy
       | thing (operations on big dense matrices) about as well as the
       | hardware can.
       | 
       | Numpy is what it is. Seems more disappointing that he had trouble
       | with the more flexible libraries like Jax.
       | 
       | Anyway there's a clear split between the sort of functionality
       | that Numpy specializes in and the sort that Jax does, and they
       | don't benefit much from stepping on each other's toes, right?
        
       | ChrisRackauckas wrote:
       | One of the reasons why I started using Julia was because the
       | NumPy syntax was so difficult. Going from MATLAB to NumPy I felt
       | like I suddenly became a mediocre programmer, spending less time
       | on math and more time on "performance engineering" of just trying
       | to figure out how to use NumPy right. Then when I went to Julia
       | it made sense to vectorize when it felt good and write a loop
       | when it felt good. Because both are fast, focus on what makes the
       | code easiest to read an understand. This blog post encapsulates
       | exactly that experience and feeling.
       | 
       | Also treating things like `np.linalg.solve` as a black box that
       | is the fastest thing in the world and you could never any better
       | so please mangle code to call it correctly... that's just wrong.
       | There's many reasons to build problem specific linear algebra
       | kernels, and that's something that is inaccessible without going
       | deeper. But that's a different story.
        
         | Bostonian wrote:
         | "Then when I went to Julia it made sense to vectorize when it
         | felt good and write a loop when it felt good. Because both are
         | fast, focus on what makes the code easiest to read an
         | understand."
         | 
         | This is true of modern Fortran also.
        
           | sundarurfriend wrote:
           | The amount of work that's been put into Fortran by the
           | LFortran [1] folks and others in recent years [2] is nothing
           | short of absurdly impressive!
           | 
           | [1] https://lfortran.org/
           | 
           | [2] "in recent years" because that's when I became aware of
           | this stuff, not to say there wasn't effort before then
        
             | pklausler wrote:
             | If you're going to give praise, include the GNU Fortran
             | developers, who have really set the standard for open
             | source production quality Fortran compiler development.
        
               | sundarurfriend wrote:
               | That sounds great, kudos to them too!
               | 
               | The only times I set foot in the Fortran world is when
               | some Julia-related development [1] [2], and then I peruse
               | some surrounding links and related projects - so I have
               | very little familiarity with GNU Fortran myself.
               | 
               | [1] https://fortran-lang.discourse.group/t/update-on-
               | prima-a-jul... [2] https://discourse.julialang.org/t/ann-
               | julia-back-end-for-lfo...
        
           | ChrisRackauckas wrote:
           | Indeed, I also like modern Fortran.
        
         | jampekka wrote:
         | > Going from MATLAB to NumPy I felt like I suddenly became a
         | mediocre programmer, spending less time on math and more time
         | on "performance engineering" of just trying to figure out how
         | to use NumPy right.
         | 
         | Matlab is about as slow without readily vectorized operations
         | as Python.
         | 
         | Slowness of Python is a huge pain point, and Julia has a clear
         | advantage here. Sadly Julia is practically just unusable beyond
         | quite niche purposes.
         | 
         | Python does now have quite serviceable jit hacks that let one
         | escape vectorization tricks, but they are still hacks and
         | performant Python alternative would be very welcome. Sadly
         | there aren't any.
        
           | AnotherGoodName wrote:
           | One of the things i suspect will happen very soon is that all
           | languages will become as fast as each other and your reason
           | to use one over the other for performance reasons might not
           | exist. I know this sounds optimistic and wishful thinking but
           | hear me out here;
           | 
           | Modern AIs are literal translation engines. That's their
           | origin. The context that lets them do what they do was
           | originally there to allow good translations. It doesn't
           | matter if the translation is programming language output, or
           | actual language output. I can today ask an AI to rewrite a
           | chunk of Matlab code into Rust and it works! There's even
           | code generators that will utilize the GPU where sensible.
           | 
           | We're really not that far off that we can write code in any
           | language and transparently behind the scenes have it actually
           | running on a more performant backend when needed. Ideally you
           | keep the Matlab/Python frontend and the translation will be
           | on the intermediate layers in a two way fashion so step
           | through/debug works as expected.
           | 
           | Based on playing with the above steps manually with good
           | results we're almost at the stage of just needing some glue
           | to make it all work. Write in Matlab/Python, and run as fast
           | as any LLVM backed language.
        
             | codethief wrote:
             | Sounds like transpiling Typescript to JavaScript, except
             | that you translate to a vastly different language, thereby
             | making you, the programmer, largely unable to reason about
             | the performance characteristics of your code ("Can the
             | transpiler optimize this Python loop or nah?"), and you
             | also throw in the indeterminism of LLMs. Oooof, I'm not
             | sure I'd want that.
        
               | AnotherGoodName wrote:
               | When you're using Python the type of performance
               | characteristics you're usually thinking of are "can i
               | call an external library that runs this on the GPU" or
               | "is this log(n) or n^2 complexity?" and such language
               | translations shouldn't change those parts.
               | 
               | As in the language is regularly benchmarked at 10-100x
               | slower than other languages. It's not a languages for
               | caring about subtle performance characteristics outside
               | "am i calling the external library correctly or not?" (as
               | in this article) or "is this algorithm exponential
               | time?".
        
           | Zambyte wrote:
           | > Sadly Julia is practically just unusable beyond quite niche
           | purposes.
           | 
           | Why? Just the ecosystem?
        
             | jampekka wrote:
             | Mainly the lack of practical support for non-REPL/Notebook
             | usage and interoperability.
        
           | bandrami wrote:
           | R
        
             | jampekka wrote:
             | R is for sure not a Python replacement. And it's even
             | slower than Python.
        
         | abdullahkhalids wrote:
         | The reason is quite simple. Julia is a language designed for
         | scientific computation. Numpy is a library frankenstein-grafted
         | onto a language that isn't really designed for scientific
         | computation.
         | 
         | We can only hope that Julia somehow wins and those of forced to
         | work in python because of network effects can be freed.
        
         | amluto wrote:
         | Is MATLAB materially different? Loops are slow (how slow
         | depends on the version), and the fastest thing in the world is
         | the utter perfection of the black box called '\'.
        
       | munchler wrote:
       | I have to agree with this as someone coming from a strongly-typed
       | background (F#). PyTorch and NumPy are very flexible and
       | powerful, but their API's are insanely underspecified because
       | every function seems to take multiple combinations of vaguely-
       | typed objects. The library just figures out the right thing to do
       | at runtime using broadcasting or other magic.
       | 
       | This kind of "clever" API seems to be both a benefit and curse of
       | the Python ecosystem in general. It makes getting started very
       | easy, but also makes mastery maddeningly difficult.
        
         | coolcase wrote:
         | Broadcasting is good, but would be nice if it were explicit.
         | 
         | Maybe it being explicit gets in the way and become inelegant
         | once you get some experience.
        
         | kccqzy wrote:
         | Broadcasting is a useful and powerful feature. It's precisely
         | specified and easily learned. However, very few type systems in
         | real world languages are powerful enough to express the concept
         | of broadcasting in the type system.
        
       | lvl155 wrote:
       | I am not a big fan of Python data libraries. They're not cohesive
       | in style across the board. Probably why I found R to be a better
       | "classroom" solution. Julia is nice and so is Mathematica for
       | purely math (and hat tip to Maple).
        
       | a_t48 wrote:
       | My issue with it is how easy it is to allocate all over the place
       | if you forget to use inplace operations. It's even worse with
       | cupy - rather than applying a series of operations to some data
       | to produce some other data, you end up producing a set of data
       | for each operation. Yes, there are workarounds, but they aren't
       | as ergonomic (cupy.fuse() almost does the right thing, cleanly,
       | but is a step you have to remember to use, and doesn't really
       | work for anything that requires multiple shapes of array).
        
       | sundarurfriend wrote:
       | > D = 1/(L _M)_ np.einsum( 'klm,ln,km->kn', A, B, C)
       | 
       | The first time I came across Einsums was via the Tullio.jl
       | package, and it seemed like magic to me. I believe the equivalent
       | of this would be:                    @tullio D[k, n] = 1/(L*M) *
       | A[k, l, m] * B[l, n] * C[k, m]
       | 
       | which is really close to the mathematical notation.
       | 
       | To my understanding, template strings from PEP 750 will allow for
       | something like:                   D = 1/(L*M) *
       | np.einsum(t'{A}[k,l,m] * {B}[l,n] * {C}[k,m]')
       | 
       | right? If so, that'd be pretty neat to have.
        
       | SirHumphrey wrote:
       | The main problem (from my point of view) of python data science
       | ecosystem is a complete lack of standardization on anything.
       | 
       | You have ten different libraries that try to behave like 4 other
       | languages and the only point of standardization is that there is
       | usually something like .to_numpy() function. This means that most
       | of the time I was not solving any specific problem related to
       | data processing, but just converting data from a format one
       | library would understand to something another library would
       | understand.
       | 
       | In Julia (a language with it's own problems, of course) things
       | mostly just work. The library for calculating uncertainties
       | interacts well with a library handling units and all this works
       | fine with the piloting library, or libraries solving differential
       | equations etc. In python, this required quite a lot of
       | boilerplate.
        
         | HdS84 wrote:
         | R with its 4(?) class systems enters the chat.
        
           | ChrisRackauckas wrote:
           | It's at least 5 at this point.
        
             | rienbdj wrote:
             | In defense of R the class systems do have different
             | characteristics and they are not deeply embedded in the
             | language or anything.
        
         | Evidlo wrote:
         | Nobody has mentioned array-api (and data-apis more generally),
         | which is trying to standardize the way people interact with
         | arrays across the Python ecosystem.
         | 
         | https://github.com/data-apis/array-api
         | 
         | https://data-apis.org/blog/announcing_the_consortium/
        
       | bee_rider wrote:
       | A dumb thought: technically scipy has a "solve_banded" function
       | that does a banded solve. He could easily recast his problem into
       | a single big diagonal problem, I guess, just with some silly
       | explicit zeros added. I wonder how the performance of that would
       | compare, to iterating over a bunch of tiny solves.
       | 
       | Of course, it would be nice if scipy had a block diagonal solver
       | (maybe it does?). But yea, I mean, of course it would be nice if
       | my problem was always built in functionality of the library, lol.
       | 
       | Maybe a bsr_matrix could work.
        
       | Imnimo wrote:
       | The trouble is that I can never tell when the cause of my issues
       | is "numpy's syntax/documentation is bad" and when it's "I am very
       | bad at thinking about broadcasting and shape arithmetic".
        
       | FrameworkFred wrote:
       | I'm actually super-interested to see the next post.
       | 
       | TBH, if you would've asked me yesterday if I'm the sort of person
       | who might get sucked in by a cliffhanger story about a numpy
       | replacement, I'm pretty sure I would've been an emphatic no.
       | 
       | But I have, in fact, just tried random things in numpy until
       | something worked...so, you know...tell me more...
        
       | __mharrison__ wrote:
       | Multiple dimensions (more than 2) are hard.
       | 
       | I was at a conference the other day, and my friend (a very smart
       | professor) was asking me if it would be possible to move away
       | from Xarray to Pandas or Polars...
       | 
       | Perhaps using Numba or Cython (with loops) might make it fast but
       | less prone to confusion.
       | 
       | Luckily for me, I mostly stay in tabular data (< 3 dimensions).
        
       | threeducks wrote:
       | I thought I'd do something smart and inline all the matrix
       | multiplications into the einsums of the vectorized multi-head
       | attention implementation from the article and set
       | optimize="optimal" to make use of the optimal matrix chain
       | multiplication algorithm
       | https://en.wikipedia.org/wiki/Matrix_chain_multiplication to get
       | a nice performance boost.                   def
       | multi_head_attention_golfed(X, W_q, W_k, W_v, W_o,
       | optimize="optimal"):             scores =
       | np.einsum('si,hij,tm,hmj->hst', X, W_q, X, W_k,
       | optimize=optimize)             weights =
       | softmax(W_k.shape[-1]**-0.5 * scores, axis=-1)
       | projected = np.einsum('hst,ti,hiv,hvd->shd', weights, X, W_v,
       | W_o, optimize=optimize)             return
       | projected.reshape(X.shape[0], W_v.shape[2])
       | 
       | This is indeed twice as fast as the vectorized implementation,
       | but, disappointingly, the naive implementation with loops is even
       | faster. Here is the code if someone wants to figure out why the
       | performance is like that: https://pastebin.com/raw/peptFyCw
       | 
       | My guess is that einsum could do a better job of considering
       | cache coherency when evaluating the sum.
        
       | jamesblonde wrote:
       | In Data for ML, everything has switch from NumPy (Pandas) to
       | Arrow (Polars, DuckDB, Spark, Pandas 2.x, etc). However, Scikit-
       | Learn is still a hold out, so it's Arrow from you data sources
       | all to way to pre-processing pipelines in Scikit-Learn when you
       | have to go back to NumPy. In practice, it now makes more sense to
       | separate feature pipelines in Arrow from training pipelines with
       | Pandas/NumPy and Scikit-Learn.*
       | 
       | *This is ML, not Deep Learning or Transformers.
        
       | ris wrote:
       | I would much _much_ prefer the cited  "awful" syntax over the
       | proposed loop syntax any day. Don't make me run a little virtual
       | machine in my head to figure out what the end result of a block
       | of code is going to be.
        
       | eurekin wrote:
       | That's one area LLMs hugely helped. You can just ask it and ask
       | to generate tests to verify
        
       | srean wrote:
       | I for one like Numpy's broadcast far better than Matlab's. In
       | Numpy it's a logical operation, no unnecessary memory/copying
       | cost.
       | 
       | Last time I checked Matlab (that was surely decade ago) it
       | actually filled memory with copied data.
        
       | harpiaharpyja wrote:
       | I don't use numpy much, but based on the documentation for that
       | function and the fact that you can index by None to add another
       | dimension, it seems like the correct call is:                 y =
       | linalg.solve(A,x[:,:,None])
       | 
       | The documentation given says that A should be (..., M, M), and x
       | should be (..., M, K). So if A is 100x5x5 and x is 100x5, then
       | all you need to do is convert x to 100x5x1.
       | 
       | Is that right? It doesn't seem that bad.
        
         | geysersam wrote:
         | You're right but if you want `y` to have the same shape as `x`
         | I think you also need to slice it after                   y =
         | linalg.solve(A, x[..., None])[..., 0]
         | 
         | I don't really mind numpy syntax much, it gets the job done in
         | most scenarios. Numba complements it really well when the code
         | is easier to express like a couple of nested loops.
         | 
         | I used Julia for a while, but found it easier to predict the
         | performance of python+numpy+numba, Julia has a few footguns and
         | the python ecosystem is just insanely polished.
        
       | credit_guy wrote:
       | Here's how you do it: focus on the simple case, solve it, then
       | ask Copilot to vectorize the code. Life is too short.
        
       | coolcase wrote:
       | Sounds like hammers and nails problem. Your nail gun is called
       | PyTorch.
        
       | hatthew wrote:
       | Am I the only one who feels like this is a rant where the author
       | is projecting their unfamiliarity with numpy? Most of the
       | examples seem like straw men, and I can't tell if that's because
       | they are or if the author genuinely thinks that these are all
       | significant problems.
       | 
       | For example, in the linalg.solve problem, based on reading the
       | documentation for <60 seconds I had two guesses for what would
       | work, and from experimenting it looks like both work equally. If
       | you don't want to take the time to understand the documentation
       | or experiment to see what works, then just write the for loop.
       | 
       | For indexing problems, how about just don't use weird indexing
       | techniques if you don't understand them? I have literally never
       | needed to use a list of lists to index an array in my years of
       | using numpy. And if you do need to use it for some reason, spend
       | 1 minute to test out what happens. "What shape does B have?" is a
       | question that can be answered with `print(B.shape)`. If the
       | concern is about reading code written by other people, then
       | context should make it clear what's happening, and if it's not
       | clear then that's the fault of the person who wrote the code for
       | not using sensible variable names/comments.
        
         | kccqzy wrote:
         | I agree this is a rant. I don't think the author is projecting
         | their unfamiliarity here, but they are complaining that numpy
         | is too difficult to learn, especially that style where you
         | avoid all Python loops.
        
       | fscknumpy wrote:
       | I had to make a HN account just to show this numpy "feature".
       | 
       | Create an array like this np.zeros((10, 20, 30)), with shape (10,
       | 20, 30).
       | 
       | Clearly if you ask for array[0][:, [0,1]].shape you will get an
       | output of shape (20, 2), because you first remove the first
       | dimension, and then : broadcasts across the second dimension, and
       | you get the first two elements of the third dimension.
       | 
       | Also, obviously we know that in numpy array[a][b][c] ==
       | array[a,b,c]. Therefore, what do you think array[0, :,
       | [0,1]].shape returns? (20, 2)? Nope. (2, 20).
       | 
       | Why? Nobody knows.
        
       | drhagen wrote:
       | > Personally, I think np.einsum is one of the rare parts of NumPy
       | that's actually good.
       | 
       | einsum only being able to do multiplication makes it quite
       | limited. If we leaned into the Einstein notation (e.g. [1]), we
       | could make something that is both quite nice and quite
       | performant.
       | 
       | [1] https://tensora.drhagen.com/
        
       | kccqzy wrote:
       | I have an unpopular opinion: I don't like numpy.einsum because it
       | is too different from the rest of numpy. You label your axes with
       | letters but none of the other regular numpy functions do that. I
       | usually avoid using numpy.einsum in favor of using a combination
       | of indexing notation with numpy.newaxis (None), broadcasting, and
       | numpy.swapaxes.
       | 
       | And I learned from someone more senior than me that you should
       | instead label your variables with single-letter axes names. This
       | way, the reader reads regular non-einsum operations and they
       | still have the axes information in their mental model. And when
       | you use numpy.einsum these axes labeling information become
       | duplicated.
        
       | zdevito wrote:
       | I tried to do something similar with 'first-class' dimension
       | objects in PyTorch
       | https://github.com/pytorch/pytorch/blob/main/functorch/dim/R... .
       | For instance multi-head attention looks like:
       | from torchdim import softmax         def multiheadattention(q, k,
       | v, num_attention_heads, dropout_prob, use_positional_embedding):
       | batch, query_sequence, key_sequence, heads, features = dims(5)
       | heads.size = num_attention_heads                      # binding
       | dimensions, and unflattening the heads from the feature dimension
       | q = q[batch, query_sequence, [heads, features]]             k =
       | k[batch, key_sequence, [heads, features]]             v =
       | v[batch, key_sequence, [heads, features]]                      #
       | einsum-style operators to calculate scores,
       | attention_scores = (q*k).sum(features) * (features.size ** -0.5)
       | # use first-class dim to specify dimension for softmax
       | attention_probs = softmax(attention_scores, dim=key_sequence)
       | # dropout work pointwise, following Rule #1
       | attention_probs = torch.nn.functional.dropout(attention_probs,
       | p=dropout_prob)                      # another matrix product
       | context_layer = (attention_probs*v).sum(key_sequence)
       | # flatten heads back into features             return
       | context_layer.order(batch, query_sequence, [heads, features])
       | 
       | However, my impression trying to get a wider userbase is that
       | while numpy-style APIs maybe are not as good as some better array
       | language, they might not be the bottleneck for getting things
       | done in PyTorch. However, other domains might suffer more, and I
       | am very excited to see a better array language catch on.
        
       | cycomanic wrote:
       | I sort of agree with the author that N>3 dimensional arrays are
       | cumbersome in numpy, that said I think this is partly because we
       | are really not that great thinking in higher dimensions. I'm
       | interested what the authors solution to the problem is, but
       | unlike the author I'm not a big fan of the eigen notation, but
       | maybe I just don't use it often enough. I don't see the issue
       | with a[:,:,None] notation and that's never given me issues,
       | however I agree about the issue with index arrays. I often make
       | something which think should work and then need to go back to the
       | documentation to realise no that's not how it works.
       | 
       | The inconsistency for argument naming is also annoying (even more
       | if we include scipy), e.g. why is it np.fft.fft(x, axis=1) but
       | np.fft.fftshift(x, axes=1)?!
        
       | josefrichter wrote:
       | You could try Elixir's Nx and related ecosystem.
        
       | janalsncm wrote:
       | I would like to be able to turn off implicit broadcasting
       | entirely. It has caused me so many evil bugs (PyTorch, but the
       | same thing applies to Numpy).
       | 
       | Imagine all of the weirdness of js "truthiness" bugs (with silent
       | unexpected behavior) combined with the inherent difficulty of
       | debugging matrix multiplications in 2, 3, or 4 dimensions. I
       | would rather herd goats in a Lumon basement.
       | 
       | Jax can do it but I'm not willing to migrate everything.
        
       | frollogaston wrote:
       | About optimization, I feel like NumPy is meant to be feature-
       | rich, convenient, and decently fast, not the fastest thing
       | possible. There are semi drop in replacements that use more CPU
       | parallelism or GPU, but they don't cover everything. Just wish it
       | were clearer _which_ NumPy build I 'm installing, cause
       | apparently `pip3 install numpy` on my Mac gave me something built
       | with the worst flags possible.
       | 
       | About >2 dimensions, I always found this confusing in NumPy but
       | just chalked it up to >2 dim arrays being inherently confusing.
       | Maybe there really is a better way.
        
       ___________________________________________________________________
       (page generated 2025-05-15 23:00 UTC)