[HN Gopher] A pure NumPy implementation of Mamba
___________________________________________________________________
A pure NumPy implementation of Mamba
Author : julius
Score : 97 points
Date : 2024-06-06 07:55 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| thoronton wrote:
| Why is it so difficult to write a short description what the
| project does? With too many open source projects people, who are
| not familiar with it, have to play detective to figure out what
| it actually is doing. "Wait a package manager based on numphy?
| That doesn't make any sense. Oh they mention LLM? So it must have
| something to do with AI"
| arthurcolle wrote:
| realistically, it's like a classification problem
|
| at this moment, in this time, if you see Mamba, either you know
| or you don't
| grandma_tea wrote:
| That's a fair criticism of many open source projects, however
| this one does link to the Mamba paper at the bottom of the
| (short) readme.
| Tomte wrote:
| The author did not post it to HN to confuse you. He did not
| post it here, at all.
|
| Why are you entitled to have every single GitHub repo
| explained, tailored to your individual knowledge?
|
| Many other people understood exactly what this is.
|
| Maybe the submitter could add a comment on HN with an
| explanation, but the author owes you nothing.
| bartread wrote:
| Mmmmmmm... you have a valid point about entitlement and OSS
| but I'm going to also agree with GP here. Not particularly
| poking at this project, and the work that's gone into it, but
| too many projects don't have a short paragraph explaining
| what their purpose is and why it matters.
|
| I'm not going to name names because I don't want to throw
| shade at what are essentially good or even great projects
| but, as a recent example, I encountered a library in our
| codebase the other day where I simply didn't get what the
| point was, and the corresponding project page and
| documentation - whilst really detailed in some ways - didn't
| help. In the end I asked ChatGPT and also found a series of
| video tutorials that I watched at 1.75x speed to understand
| it.
|
| It was worth doing that because the thing is already used in
| our codebase, and it's important in that context for me to
| understand why and the value it adds.
|
| But if I run across something reading an article or whatever,
| and it mentions some library or project in passing, I'm semi-
| regularly left a bit baffled as to what and why and I
| probably don't have the time to go digging. Nowadays I
| probably _would_ ask ChatGPT for a short summary because it
| 's so convenient and it's often quicker than Googling, and
| maybe I'll start submitting PRs against readme.md files to
| add those summaries (with a bit of editing) to the beginning
| of them.
| edflsafoiewq wrote:
| The doc comment at the top of the .py file is sufficiently
| descriptive """Simple, minimal implementation
| of Mamba in one file of Numpy adapted from (1) and inspired
| from (2). Suggest reading the following
| before/while reading the code: [1] Mamba: Linear-
| Time Sequence Modeling with Selective State Spaces (Albert Gu
| and Tri Dao) https://arxiv.org/abs/2312.00752
| [2] The Annotated S4 (Sasha Rush and Sidd Karamcheti)
| https://srush.github.io/annotated-s4
| 1024core wrote:
| > The doc comment at the top of the .py file is sufficiently
| descriptive
|
| Which is the purpose of these doc comments.
|
| If you have the time to gripe on HN, you have the time to
| click on the link and do some reading. The "Usage" section in
| the link above is enough to help one disambiguate; if not,
| then there's always the doc comment.
| exe34 wrote:
| i believe the gripe is a plea for others not to do the same
| thing, and instead to put some thought into presentation.
| the gripe is not about the specific case. if the poster is
| anything like me, if the first ten words of your post don't
| make sense to me, I'm just moving on to something else.
| mint2 wrote:
| No, I can see the commenters frustration. Unless one is
| versed in Llm space, one is more likely to know mamba as the
| package manager and find the headline and also the GitHub
| page confusing. The markdown read me is supposed to provide
| the info the commenter wanted.
|
| Even that first line you posted is unhelpfully circular,
| defining mamba as an implementation of mamba.
|
| Call me old fashioned, but a best practice read me should
| concisely provide: what the thing is, and why it is, aka the
| problem it solves. (And not with circular definition.)
| ktm5j wrote:
| Okay, so why not just put that in the readme??
| rowanG077 wrote:
| Totally unclear what this is. I scrolled through the readme and
| it didn't even mention once what it does.
| wodenokoto wrote:
| It totally mentions what it does. It takes the sentence "I have
| a dream that" and extends it to: "I have a dream that I will be
| able to see the sunrise in the morning."
|
| It's an LLM.
| szvsw wrote:
| It's much more than just an LLM. The mamba architecture is
| often used in the _backbone_ of an LLM but you can use it
| more generally as a linear-time (as opposed to quadratic-
| time) sequence modeling architecture (as per the original
| paper's title, which is cited in the linked repo). It is much
| closer to a convolutional network or an RNN (it has bits of
| both) than to a transformer architecture. It is based off the
| notion of state spaces (with a twist).
|
| I use Mamba for instance to build surrogate models of
| physics-based building energy models which can generate
| 15-min interval data for heating, cooling, electricity, and
| hot water usage of any building in the US from building
| characteristics, weather timeseries, and occupancy time
| series.
|
| It has many other non-NLP applications.
| Ddav wrote:
| Would love to hear more about that building energy
| modelling example, have you done a writeup you could share?
| szvsw wrote:
| The Mamba application is my current research project so I
| haven't published anything yet. But the basic idea is to
| create a latent representation of the static features,
| repeat the latent vector to form a time series,
| concatenate with the weather/occupancy time series, run
| through mamba layers, and bob's your uncle. Shoot me an
| email (in my bio) if you would like to chat more!
|
| I can also share my master's thesis which is similar but
| using CNN layers rather than Mamba and only for monthly
| predictions rather than 15-min interval data. There are
| some other architectural differences but the basics are
| the same. That work is also globally robust.
|
| As you can imagine, the current work I am doing at a much
| higher resolution is a big step up, and Mamba so far is
| working out great.
| ahmadmijot wrote:
| Can I see your thesis?
|
| I'm currently learning about machine learning and digital
| twin but don't really where to start
| blagie wrote:
| Is there a good, easy tutorial?
| sva_ wrote:
| It's an LLM architecture competing with transformers:
| https://arxiv.org/abs/2312.00752
|
| Proponents of it usually highlight it's inference performance,
| in particular linear scaling with the input tokens.
| szvsw wrote:
| I really disagree with pigeonholing it as an LLM
| architecture! It is much more general than that as I
| mentioned in another comment in this post [1] (and of course
| as mentioned in the original paper which you linked).
|
| [1] https://news.ycombinator.com/item?id=40616181
| piqufoh wrote:
| Completely - I assumed it was an implementation of
| https://github.com/mamba-org/mamba
|
| I also assumed that "a pure NumPy implementation" meant that it
| was built purely with numpy, which it isn't smh
| Hugsun wrote:
| Is there a benefit to implementing it in numpy over pytorch or
| tf?
| blagie wrote:
| Yes.
|
| A numpy program will work tomorrow.
|
| ALL of the machine learning frameworks have incredible churn. I
| have code from two years ago which I can't make work reliably
| anymore -- not for lack of trying -- due to all the breaking
| changes and dependency issues. There are systems where each
| model runs in its own docker, with its own set of pinned
| library versions (many with security issues now). It's a
| complete and utter trainwreck. Don't even get me started on
| CUDA versions (or Intel/AMD compatibility, or older /
| deprecated GPUs).
|
| For comparison, virtually all of my non-machine-learning Python
| code from the year 2010 all still works in 2024.
|
| There are good reasons for this. Those breaking changes aren't
| just for fun; they're representative of the very rapid rate of
| progress in a rapidly-changing field. In contrast, Python or
| numpy are mature systems. Still, it makes many machine learning
| models insanely expensive to maintain in production
| environments.
|
| If you're a machine learning researcher, it's fine, but if you
| have a system like an ecommerce web site or a compiler or
| whatever, where you'd like to be able to plug in a task-
| specific ML model, your downpayment is a weekend of hacking to
| make it work, but your ongoing rent of maintenance costs might
| be a few weeks each year for each model you use. I have a
| million places I'd love to plug in a little bit of ML. However,
| I'm very judicious with it, not because it's hard to do, but
| because it's expensive to maintain.
|
| A pure Python + numpy implementation would mean that you can
| avoid all of that.
| ptspts wrote:
| Contrary to what the title says, this is note a pure Pyhon +
| numpy implementation: in fact it also imports einops,
| transformers and torch.
|
| For me pure X means: to use this, all you have to install is X.
| qwertox wrote:
| I was struggling with the fairness of your comment because the
| libraries are not used as a replacement to NumPy, but to ease
| dealing with the data. This made me check and it turns out
| that:
|
| "Yes, the comment you mentioned is fair and reflects a common
| perspective in the programming and data science communities
| regarding the usage of "pure" implementations. When someone
| refers to a "pure X implementation," the typical expectation is
| that the implementation will rely solely on the functionalities
| of library X, without introducing dependencies from other
| libraries or frameworks."
|
| TIL.
| rsfern wrote:
| I don't see a PyTorch import, and the transformers import is
| just for the tokenizer which I don't really consider a
| nontrivial part of mamba
|
| So it's just numpy and einops, which is pretty cool. I guess
| you could probably rewrite all the einops stuff in pure numpy
| if you want to trade readable code for eliminating the einops
| dependency
|
| Edit: found the torch import, but it's just for a single
| torch.load to deserialize some data
| TeMPOraL wrote:
| > _Edit: found the torch import, but it's just for a single
| torch.load to deserialize some data_
|
| Torch is quite heavy though, isn't it? All for that one
| deserialization call?
| nobodywillobsrv wrote:
| Yes exactly. It's classic mis-selling.
| mint2 wrote:
| Okay so is Mamba also an llm? There's too much name overloading!
|
| I'm familiar with mamba, the conda like thing in python, but a
| numpy implementation of that makes no sense.
| nerdponx wrote:
| I'm not normally one to gripe about name conflicts, but I knew
| this was going to get confusing. You could install an
| implementation of the Mamba LLM using the Mamba package
| manager!
| exe34 wrote:
| maybe we should get rid of names entirely, and call it all
| software. "software: a super fast and fun retro-encabulator
| written in rust!"
| volemo wrote:
| Assign UUIDs to everything!
___________________________________________________________________
(page generated 2024-06-08 23:01 UTC)