[HN Gopher] Mamba: Linear-Time Sequence Modeling with Selective ...
       ___________________________________________________________________
        
       Mamba: Linear-Time Sequence Modeling with Selective State Spaces
        
       Author : anigbrowl
       Score  : 56 points
       Date   : 2023-12-04 20:12 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | haltist wrote:
       | This looks like a monad.
        
       | triyambakam wrote:
       | I've been doing web related engineering for the past few years
       | and recently been interested in machine learning. Taking a random
       | perusal through the code for this [1] feels daunting. So many
       | single letter variables, it looks like old JS soup code. It makes
       | me hesitant to leave Typescript and even Rust (for web tooling).
       | 
       | [1] https://github.com/state-
       | spaces/mamba/blob/main/mamba_ssm/op...
        
         | ekelsen wrote:
         | The variable names A, B, C, D and related assume knowledge of
         | the state space model / formulation where these are the common
         | names of matrices in the state space equations.
        
           | triyambakam wrote:
           | Thanks, that's helpful.
        
         | junipertea wrote:
         | It corresponds to the paper formulas as a courtesy. While
         | descriptive comments or function signature might be nicer, it
         | still is _reasonably_ easy to follow along from the paper
         | description! (Although, most of the complexity is hidden away
         | in selective_scan_cuda.fwd())
        
           | triyambakam wrote:
           | Aha, thanks, that is helpful. Any chance you know of a guide
           | that walks through a paper and code side by side?
        
         | filterfiber wrote:
         | > It makes me hesitant to leave Typescript and even Rust (for
         | web tooling).
         | 
         | To be clear this is not what a "typical" python
         | application/script looks like at all. While there's definitely
         | an argument to be made about python's immature type support
         | (especially compared to typescript/rust), don't use this as an
         | example of python's readability.
         | 
         | This was written to closely follow the math equations, not to
         | be a maintainable piece of software. It makes a lot more sense
         | from a mathematics/academic perspective, but not from a
         | software development perspective.
         | 
         | Strong typing support would do nothing for readability but make
         | this harder to read. Typing support doesn't help your naming,
         | commenting, formatting, etc. that makes this so hard to read.
        
         | vaillant wrote:
         | Well, this is cutting edge ML code written in PyTorch. I
         | wouldn't worry about understanding something like this - you
         | start with scikit-learn first.
        
       | intalentive wrote:
       | >letting the SSM parameters be functions of the input
       | 
       | That's what the attention mechanism has going for it. Different
       | weights for different sets of tokens.
        
         | wizzard0 wrote:
         | Authors claim their mechanism is more computationally efficient
         | (both asymptotically and on current hardware) than the
         | Transformer attention, while performing much the same role, yes
        
       | mxwsn wrote:
       | Twitter thread from Albert Gu, a primary author:
       | https://nitter.net/_albertgu/status/1731727672286294400?s=20
       | 
       | "Quadratic attention has been indispensable for information-dense
       | modalities such as language... until now.
       | 
       | Announcing Mamba: a new SSM arch. that has linear-time scaling,
       | ultra long context, and most importantly--outperforms
       | Transformers everywhere we've tried."
        
       | og_kalu wrote:
       | If there's really no catch then this is a groundbreaking paper.
       | Unambiguously beating transformers with margin on all metrics
       | they tested.
       | 
       | https://imgur.com/a/WKFF1yv
        
       ___________________________________________________________________
       (page generated 2023-12-04 23:00 UTC)