[HN Gopher] Mamba: Linear-Time Sequence Modeling with Selective ...
___________________________________________________________________
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Author : anigbrowl
Score : 56 points
Date : 2023-12-04 20:12 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| haltist wrote:
| This looks like a monad.
| triyambakam wrote:
| I've been doing web related engineering for the past few years
| and recently been interested in machine learning. Taking a random
| perusal through the code for this [1] feels daunting. So many
| single letter variables, it looks like old JS soup code. It makes
| me hesitant to leave Typescript and even Rust (for web tooling).
|
| [1] https://github.com/state-
| spaces/mamba/blob/main/mamba_ssm/op...
| ekelsen wrote:
| The variable names A, B, C, D and related assume knowledge of
| the state space model / formulation where these are the common
| names of matrices in the state space equations.
| triyambakam wrote:
| Thanks, that's helpful.
| junipertea wrote:
| It corresponds to the paper formulas as a courtesy. While
| descriptive comments or function signature might be nicer, it
| still is _reasonably_ easy to follow along from the paper
| description! (Although, most of the complexity is hidden away
| in selective_scan_cuda.fwd())
| triyambakam wrote:
| Aha, thanks, that is helpful. Any chance you know of a guide
| that walks through a paper and code side by side?
| filterfiber wrote:
| > It makes me hesitant to leave Typescript and even Rust (for
| web tooling).
|
| To be clear this is not what a "typical" python
| application/script looks like at all. While there's definitely
| an argument to be made about python's immature type support
| (especially compared to typescript/rust), don't use this as an
| example of python's readability.
|
| This was written to closely follow the math equations, not to
| be a maintainable piece of software. It makes a lot more sense
| from a mathematics/academic perspective, but not from a
| software development perspective.
|
| Strong typing support would do nothing for readability but make
| this harder to read. Typing support doesn't help your naming,
| commenting, formatting, etc. that makes this so hard to read.
| vaillant wrote:
| Well, this is cutting edge ML code written in PyTorch. I
| wouldn't worry about understanding something like this - you
| start with scikit-learn first.
| intalentive wrote:
| >letting the SSM parameters be functions of the input
|
| That's what the attention mechanism has going for it. Different
| weights for different sets of tokens.
| wizzard0 wrote:
| Authors claim their mechanism is more computationally efficient
| (both asymptotically and on current hardware) than the
| Transformer attention, while performing much the same role, yes
| mxwsn wrote:
| Twitter thread from Albert Gu, a primary author:
| https://nitter.net/_albertgu/status/1731727672286294400?s=20
|
| "Quadratic attention has been indispensable for information-dense
| modalities such as language... until now.
|
| Announcing Mamba: a new SSM arch. that has linear-time scaling,
| ultra long context, and most importantly--outperforms
| Transformers everywhere we've tried."
| og_kalu wrote:
| If there's really no catch then this is a groundbreaking paper.
| Unambiguously beating transformers with margin on all metrics
| they tested.
|
| https://imgur.com/a/WKFF1yv
___________________________________________________________________
(page generated 2023-12-04 23:00 UTC)