[HN Gopher] Einsum in Depth
       ___________________________________________________________________
        
       Einsum in Depth
        
       Author : joelburget
       Score  : 76 points
       Date   : 2025-01-03 16:34 UTC (3 days ago)
        
 (HTM) web link (einsum.joelburget.com)
 (TXT) w3m dump (einsum.joelburget.com)
        
       | dan-robertson wrote:
       | I was pretty confused by this for a while. I think the context I
       | was missing is that this is about a function in nympy called
       | 'einsum' which is somewhat related to Einstein summation
       | notation.
       | 
       | To write a little more: there are two things people mean by
       | 'tensor'. One is a kind of geometric object that corresponds to a
       | multilinear map between vector spaces (or modules, I suppose),
       | and another is a array indexed by k-tuples (ie what would be
       | called a multidimensional array in C or Fortran programming). For
       | a given choice basis, one can represent a degree k tensor with a
       | k-dimensional array of scalars.
       | 
       | Only certain operations make sense geometrically on tensors (in
       | the sense that they do not depend on the choice of basis) and
       | these can be broken down into:
       | 
       | - tensor products, which take degree n and m tensors and output a
       | degree (n + m) tensor
       | 
       | - contractions which take a degree n tensor and output a degree
       | (n-2) tensor
       | 
       | - generalized transpositions which take a degree n tensor and
       | output a degree n tensor
       | 
       | A matrix multiplication can be seen as a composition of a tensor
       | product and a contraction; a matrix trace is just a contraction.
       | 
       | The Einstein summation convention is a notation which succinctly
       | expresses these geometric operations by describing what one does
       | with the 'grid of numbers' representation, combined with the
       | convention that, if an index is repeated in a term twice (an odd
       | number bigger than 1 is meaningless, an even number is equivalent
       | to reapplying the 'twice' rule many times) one should implicitly
       | sum the expression for each basis vector for that index. You get:
       | tensor products by juxtaposition, contractions by repeated
       | indexes, and transpositions by reordering indexes.
       | 
       | In numpy, it is for general computation rather than expressing
       | something geometric so one doesn't need the restrictions on
       | number of times an index occurs. Instead I guess the rule is
       | something like:
       | 
       | - if index is only on lhs, sum over it
       | 
       | - if index on lhs and rhs then don't sum
       | 
       | - if index only on rhs or repeated on rhs, error
       | 
       | And computationally I guess it's something like (1) figure out
       | output shape and (2):                 for output_index of output:
       | p = 1         for (input, input_indexes) of (inputs, lhs):
       | p = p * input[input_indexes(output_index)]
       | output[output_index] = p
        
         | almostgotcaught wrote:
         | Academic CS, moreso than any of the other STEM subjects, is
         | like that whose line is it anyway meme - all the words are
         | completely made up and absolutely none of the results matter.
         | 
         | So a vector isn't a vector, a tensor isn't a tensor, einsum
         | isn't actually Einstein summation, automatic differentiation
         | often gives derivatives for things that aren't differentiable,
         | a neural network has nothing to do with neurons, etc etc etc
         | 
         | CS people think it's a crime that humanities people make up
         | words in order to publish but they're really living in an
         | extremely glassy glass house.
        
           | mjburgess wrote:
           | Yes, I think it comes from the kind of metaphorical language
           | that writing programs induces. If artists "kinda lie" when
           | describing how their art represents some emotion or reality,
           | likewise, computer scientists are also engaged in this
           | 'metaphorical representation' game -- unlike the science,
           | which aim to actually represent, not metaphorically.
           | 
           | This also drives me mad reading many a computer science paper
           | now -- the disrespect for the ordinary meaning of words is
           | only so obscene in the most radical kinds of poetry. This
           | feels wholly out of place in anything called a 'science', and
           | leads to endless amounts of confusion.
           | 
           | There are none in all of academia so brazen in a poetic use
           | of language and at the same time so incurious as to its
           | meaning.
        
           | nyrikki wrote:
           | Vec4 is a vector, but yes C++ base vec is not.
           | 
           | In some ways CS is more correct with the modern (abstract)
           | algebra definition of a vector, vs the historical didactics
           | arrow concept.
           | 
           | Tensor has been abused, it is a property of bilinear maps,
           | and as matrix multiplication is a bilinear operation there is
           | a concept overlap there.
           | 
           | Vectors are really just elements of a vector space, not
           | necessarily arrows.
        
         | tfgg wrote:
         | And one of the interesting (and hard) computer science problems
         | is how to turn an arbitrary einsum expression into the most
         | efficient series of operations (transpose, matmul, etc.)
         | available in hardware kernels. This is mentioned in the post
         | under "How Contractions Are Actually Computed".
        
         | gus_massa wrote:
         | I mostly agree.
         | 
         | > _an even number is equivalent to reapplying the 'twice' rule
         | many times_
         | 
         | I never heard that extension and in my opinion is a bad idea.
         | One nice property of the Einstein summation notation is that
         | you can reorder the tensors(with their index), and the result
         | does no change. If you allow 4x repetitions this is not valid).
         | 
         | ---
         | 
         | Also, a nice property of tensors written in paper is that you
         | can write the index as subscripsts or superscripts to remember
         | how they change when there is a base change. You can only
         | colapse a subscripst with a superscript, so the result of the
         | operation does not depend on the choice of the base. (It would
         | be very nice that Python can remember and check this too to
         | avoid silly error.)
        
           | dan-robertson wrote:
           | Ah I think I'm just wrong and the 4x thing shouldn't be
           | allowed
        
       | chpatrick wrote:
       | For anyone who does a lot of einsum I highly recommend the einx
       | package because the stock single-letter syntax very quickly
       | becomes hard to read.
        
         | pama wrote:
         | In your experience, why einx over einops+einsum?
        
       | thomasahle wrote:
       | I've found that thinking of tensors in terms of graphs make
       | Einsums much more natural.
       | 
       | For example, a matrix product MN, `a b, b c -> a c` is just two
       | nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are
       | connected, so the resulting graph has only two "free" edges `a`
       | and `c`. That's how we know the result is another matrix.
       | 
       | Once you look at tensors this way, a number of things that are
       | normally tricky with standard matrix notation become trivial.
       | Such a higher order derivatives used in neural networks.
       | 
       | I wrote https://tensorcookbook.com/ to give a simple reference
       | for all of this.
        
         | ziofill wrote:
         | A few years ago I wrote a note (never published) on how many
         | products can be seen in this way
         | https://arxiv.org/abs/1903.01366
        
           | thomasahle wrote:
           | This is beautiful! I see we even decided on the same
           | "vectorization tensor" and notation, which makes Kathri-Rao,
           | and all the other "matrix products" much more intuitive!
        
         | ziofill wrote:
         | I just opened your book, very nice! I really like the
         | derivatives. You went above and beyond with latex diagrams ^^
        
           | thomasahle wrote:
           | At this point I feel like I need to write a tikz library just
           | for tensor diagrams...
        
         | 6gvONxR4sf7o wrote:
         | I use a similar notation, but never quite found a satisfactory
         | notation for elementwise operations (e.g. `-M-a + -b`,
         | especially broadcasted ones which I end up doing as `-A-B- + -c
         | 1-`) or for denoting what derivatives are with respect to.
         | Using differentials gets around some of the latter, but still,
         | I was never quite satisfied. Any chance you've found nice
         | options there?
        
           | thomasahle wrote:
           | The broadcasting using "1-" (or the order-1 copy/kronecker
           | tensor) is actually pretty nice, I think. It allows a lot of
           | nice manipulations, and make the low rank of the matrices
           | really clear etc.
           | 
           | Addition does feel a bit hacky, but it may just be necessary
           | for any associative notation. At least it means that rules
           | such as distribution works the same as with classical
           | notation.
           | 
           | I also wrote this tensorgrad library to do symbolic
           | calculations using tensor networks:
           | https://github.com/thomasahle/tensorgrad it keeps track of
           | what you are taking derivatives with respect to. But I agree
           | it would be nice to show more explicitly.
        
         | stared wrote:
         | I had been working on a library for visualizing tensor
         | contractions (so-called Penrose tensor diagrams),
         | https://github.com/Quantum-Flytrap/tensor-diagrams, with an
         | example (a tutorial stub) https://p.migdal.pl/art-tensor-
         | diagrams/.
         | 
         | My background is in quantum physics, and it is a useful tool
         | for understanding some aspects of entanglement. Here, the idea
         | was to show its use in machine learning in general and deep
         | learning in particular.
        
       | ssivark wrote:
       | I wish Python had tools to support combining tensors a la
       | _einsum_ but with arbitrary operations instead of just
       | multiplication and addition. The only tool I 'm aware of that
       | provides a very slickk interface for this is _Tullio_ in Julia.
       | 
       | Among other things, that would make a lot of codde very
       | convenient -- including graph algorithms. This is the idea behind
       | GraphBLAS https://graphblas.org/
        
       | captainmuon wrote:
       | Really interesting, I have been confused about the einsum
       | function before. As a former physicist, I would also like to see
       | the actual tensor notation for the examples. So instead of ij,jk
       | something like $A_i^j B_j^k$ (imagine the math here instead of
       | the LaTeX).
        
         | joshlk wrote:
         | Tensors used in deep learning are not the same as the
         | definition used by Physicists - blame the DL community for this
         | :). So DL tensors are just N-dimensional arrays of data, and
         | there is no concept of covariance and contravariance of the
         | dimensions. You could think of DL tensors as Cartesian tensors
         | and they don't need to conform to the same transformation laws
         | that Physics tensors do.
        
         | pama wrote:
         | Einsum immediately clicked with me because in my past advanced
         | classical mechanics courses such concise contractions of multi-
         | index creatures were really the only way to make quick sense of
         | complicated problems without the limitations of low-dimensional
         | array notations. Physics tensors are of course different
         | creatures than the simpler multidimensional arrays of PyTorch
         | and co., but the simplified einsum notation still works very
         | well. I ended up sometimes rewriting my Einsum code to plain
         | tensor manipulation code in order to better work with
         | collaborators who didnt like einsum, but I still experiment in
         | my own hacks with einsum when I need to. Occasionally I felt
         | that it would have been nice to also have the Levi-Civita
         | symbol available in einsum, or to be able to use complex
         | numbers and take complex conjugates, but these are all super-
         | specialized requests and there often is a good way around them
         | without modifying einsum.
        
         | joelburget wrote:
         | This is a good idea, though one problem is that Einsum notation
         | (as realized in Numpy and Pytorch) doesn't support the notion
         | of co-contravariance, and the site is based on their Einsum
         | notation. I could potentially add the variances for the
         | examples, though that would move away from how the site
         | currently works (where the information about the reduction
         | comes only from the einsum input).
        
       | sakex wrote:
       | That's something I wish I had when I started looking at einsums.
       | It gets interesting when you start thinking about optimum paths
       | (like in the opt_einsum package), sharded/distributed einsums,
       | and ML accelerators.
        
       | siver_john wrote:
       | God I love einsum so much. I discovered it in grad school and it
       | made my life so much nicer, definitely recommend learning it, you
       | can do some wicked powerful stuff. (One of the first things I
       | asked one of the NVIDIA guys when they were showing off cuPy a
       | few years back was whether it had einsum.)
        
       ___________________________________________________________________
       (page generated 2025-01-06 23:01 UTC)