[HN Gopher] Einsum in Depth
___________________________________________________________________
Einsum in Depth
Author : joelburget
Score : 76 points
Date : 2025-01-03 16:34 UTC (3 days ago)
(HTM) web link (einsum.joelburget.com)
(TXT) w3m dump (einsum.joelburget.com)
| dan-robertson wrote:
| I was pretty confused by this for a while. I think the context I
| was missing is that this is about a function in nympy called
| 'einsum' which is somewhat related to Einstein summation
| notation.
|
| To write a little more: there are two things people mean by
| 'tensor'. One is a kind of geometric object that corresponds to a
| multilinear map between vector spaces (or modules, I suppose),
| and another is a array indexed by k-tuples (ie what would be
| called a multidimensional array in C or Fortran programming). For
| a given choice basis, one can represent a degree k tensor with a
| k-dimensional array of scalars.
|
| Only certain operations make sense geometrically on tensors (in
| the sense that they do not depend on the choice of basis) and
| these can be broken down into:
|
| - tensor products, which take degree n and m tensors and output a
| degree (n + m) tensor
|
| - contractions which take a degree n tensor and output a degree
| (n-2) tensor
|
| - generalized transpositions which take a degree n tensor and
| output a degree n tensor
|
| A matrix multiplication can be seen as a composition of a tensor
| product and a contraction; a matrix trace is just a contraction.
|
| The Einstein summation convention is a notation which succinctly
| expresses these geometric operations by describing what one does
| with the 'grid of numbers' representation, combined with the
| convention that, if an index is repeated in a term twice (an odd
| number bigger than 1 is meaningless, an even number is equivalent
| to reapplying the 'twice' rule many times) one should implicitly
| sum the expression for each basis vector for that index. You get:
| tensor products by juxtaposition, contractions by repeated
| indexes, and transpositions by reordering indexes.
|
| In numpy, it is for general computation rather than expressing
| something geometric so one doesn't need the restrictions on
| number of times an index occurs. Instead I guess the rule is
| something like:
|
| - if index is only on lhs, sum over it
|
| - if index on lhs and rhs then don't sum
|
| - if index only on rhs or repeated on rhs, error
|
| And computationally I guess it's something like (1) figure out
| output shape and (2): for output_index of output:
| p = 1 for (input, input_indexes) of (inputs, lhs):
| p = p * input[input_indexes(output_index)]
| output[output_index] = p
| almostgotcaught wrote:
| Academic CS, moreso than any of the other STEM subjects, is
| like that whose line is it anyway meme - all the words are
| completely made up and absolutely none of the results matter.
|
| So a vector isn't a vector, a tensor isn't a tensor, einsum
| isn't actually Einstein summation, automatic differentiation
| often gives derivatives for things that aren't differentiable,
| a neural network has nothing to do with neurons, etc etc etc
|
| CS people think it's a crime that humanities people make up
| words in order to publish but they're really living in an
| extremely glassy glass house.
| mjburgess wrote:
| Yes, I think it comes from the kind of metaphorical language
| that writing programs induces. If artists "kinda lie" when
| describing how their art represents some emotion or reality,
| likewise, computer scientists are also engaged in this
| 'metaphorical representation' game -- unlike the science,
| which aim to actually represent, not metaphorically.
|
| This also drives me mad reading many a computer science paper
| now -- the disrespect for the ordinary meaning of words is
| only so obscene in the most radical kinds of poetry. This
| feels wholly out of place in anything called a 'science', and
| leads to endless amounts of confusion.
|
| There are none in all of academia so brazen in a poetic use
| of language and at the same time so incurious as to its
| meaning.
| nyrikki wrote:
| Vec4 is a vector, but yes C++ base vec is not.
|
| In some ways CS is more correct with the modern (abstract)
| algebra definition of a vector, vs the historical didactics
| arrow concept.
|
| Tensor has been abused, it is a property of bilinear maps,
| and as matrix multiplication is a bilinear operation there is
| a concept overlap there.
|
| Vectors are really just elements of a vector space, not
| necessarily arrows.
| tfgg wrote:
| And one of the interesting (and hard) computer science problems
| is how to turn an arbitrary einsum expression into the most
| efficient series of operations (transpose, matmul, etc.)
| available in hardware kernels. This is mentioned in the post
| under "How Contractions Are Actually Computed".
| gus_massa wrote:
| I mostly agree.
|
| > _an even number is equivalent to reapplying the 'twice' rule
| many times_
|
| I never heard that extension and in my opinion is a bad idea.
| One nice property of the Einstein summation notation is that
| you can reorder the tensors(with their index), and the result
| does no change. If you allow 4x repetitions this is not valid).
|
| ---
|
| Also, a nice property of tensors written in paper is that you
| can write the index as subscripsts or superscripts to remember
| how they change when there is a base change. You can only
| colapse a subscripst with a superscript, so the result of the
| operation does not depend on the choice of the base. (It would
| be very nice that Python can remember and check this too to
| avoid silly error.)
| dan-robertson wrote:
| Ah I think I'm just wrong and the 4x thing shouldn't be
| allowed
| chpatrick wrote:
| For anyone who does a lot of einsum I highly recommend the einx
| package because the stock single-letter syntax very quickly
| becomes hard to read.
| pama wrote:
| In your experience, why einx over einops+einsum?
| thomasahle wrote:
| I've found that thinking of tensors in terms of graphs make
| Einsums much more natural.
|
| For example, a matrix product MN, `a b, b c -> a c` is just two
| nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are
| connected, so the resulting graph has only two "free" edges `a`
| and `c`. That's how we know the result is another matrix.
|
| Once you look at tensors this way, a number of things that are
| normally tricky with standard matrix notation become trivial.
| Such a higher order derivatives used in neural networks.
|
| I wrote https://tensorcookbook.com/ to give a simple reference
| for all of this.
| ziofill wrote:
| A few years ago I wrote a note (never published) on how many
| products can be seen in this way
| https://arxiv.org/abs/1903.01366
| thomasahle wrote:
| This is beautiful! I see we even decided on the same
| "vectorization tensor" and notation, which makes Kathri-Rao,
| and all the other "matrix products" much more intuitive!
| ziofill wrote:
| I just opened your book, very nice! I really like the
| derivatives. You went above and beyond with latex diagrams ^^
| thomasahle wrote:
| At this point I feel like I need to write a tikz library just
| for tensor diagrams...
| 6gvONxR4sf7o wrote:
| I use a similar notation, but never quite found a satisfactory
| notation for elementwise operations (e.g. `-M-a + -b`,
| especially broadcasted ones which I end up doing as `-A-B- + -c
| 1-`) or for denoting what derivatives are with respect to.
| Using differentials gets around some of the latter, but still,
| I was never quite satisfied. Any chance you've found nice
| options there?
| thomasahle wrote:
| The broadcasting using "1-" (or the order-1 copy/kronecker
| tensor) is actually pretty nice, I think. It allows a lot of
| nice manipulations, and make the low rank of the matrices
| really clear etc.
|
| Addition does feel a bit hacky, but it may just be necessary
| for any associative notation. At least it means that rules
| such as distribution works the same as with classical
| notation.
|
| I also wrote this tensorgrad library to do symbolic
| calculations using tensor networks:
| https://github.com/thomasahle/tensorgrad it keeps track of
| what you are taking derivatives with respect to. But I agree
| it would be nice to show more explicitly.
| stared wrote:
| I had been working on a library for visualizing tensor
| contractions (so-called Penrose tensor diagrams),
| https://github.com/Quantum-Flytrap/tensor-diagrams, with an
| example (a tutorial stub) https://p.migdal.pl/art-tensor-
| diagrams/.
|
| My background is in quantum physics, and it is a useful tool
| for understanding some aspects of entanglement. Here, the idea
| was to show its use in machine learning in general and deep
| learning in particular.
| ssivark wrote:
| I wish Python had tools to support combining tensors a la
| _einsum_ but with arbitrary operations instead of just
| multiplication and addition. The only tool I 'm aware of that
| provides a very slickk interface for this is _Tullio_ in Julia.
|
| Among other things, that would make a lot of codde very
| convenient -- including graph algorithms. This is the idea behind
| GraphBLAS https://graphblas.org/
| captainmuon wrote:
| Really interesting, I have been confused about the einsum
| function before. As a former physicist, I would also like to see
| the actual tensor notation for the examples. So instead of ij,jk
| something like $A_i^j B_j^k$ (imagine the math here instead of
| the LaTeX).
| joshlk wrote:
| Tensors used in deep learning are not the same as the
| definition used by Physicists - blame the DL community for this
| :). So DL tensors are just N-dimensional arrays of data, and
| there is no concept of covariance and contravariance of the
| dimensions. You could think of DL tensors as Cartesian tensors
| and they don't need to conform to the same transformation laws
| that Physics tensors do.
| pama wrote:
| Einsum immediately clicked with me because in my past advanced
| classical mechanics courses such concise contractions of multi-
| index creatures were really the only way to make quick sense of
| complicated problems without the limitations of low-dimensional
| array notations. Physics tensors are of course different
| creatures than the simpler multidimensional arrays of PyTorch
| and co., but the simplified einsum notation still works very
| well. I ended up sometimes rewriting my Einsum code to plain
| tensor manipulation code in order to better work with
| collaborators who didnt like einsum, but I still experiment in
| my own hacks with einsum when I need to. Occasionally I felt
| that it would have been nice to also have the Levi-Civita
| symbol available in einsum, or to be able to use complex
| numbers and take complex conjugates, but these are all super-
| specialized requests and there often is a good way around them
| without modifying einsum.
| joelburget wrote:
| This is a good idea, though one problem is that Einsum notation
| (as realized in Numpy and Pytorch) doesn't support the notion
| of co-contravariance, and the site is based on their Einsum
| notation. I could potentially add the variances for the
| examples, though that would move away from how the site
| currently works (where the information about the reduction
| comes only from the einsum input).
| sakex wrote:
| That's something I wish I had when I started looking at einsums.
| It gets interesting when you start thinking about optimum paths
| (like in the opt_einsum package), sharded/distributed einsums,
| and ML accelerators.
| siver_john wrote:
| God I love einsum so much. I discovered it in grad school and it
| made my life so much nicer, definitely recommend learning it, you
| can do some wicked powerful stuff. (One of the first things I
| asked one of the NVIDIA guys when they were showing off cuPy a
| few years back was whether it had einsum.)
___________________________________________________________________
(page generated 2025-01-06 23:01 UTC)