hngopher.com

       [HN Gopher] A pure NumPy implementation of Mamba
       ___________________________________________________________________
        
       A pure NumPy implementation of Mamba
        
       Author : julius
       Score  : 97 points
       Date   : 2024-06-06 07:55 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | thoronton wrote:
       | Why is it so difficult to write a short description what the
       | project does? With too many open source projects people, who are
       | not familiar with it, have to play detective to figure out what
       | it actually is doing. "Wait a package manager based on numphy?
       | That doesn't make any sense. Oh they mention LLM? So it must have
       | something to do with AI"
        
         | arthurcolle wrote:
         | realistically, it's like a classification problem
         | 
         | at this moment, in this time, if you see Mamba, either you know
         | or you don't
        
         | grandma_tea wrote:
         | That's a fair criticism of many open source projects, however
         | this one does link to the Mamba paper at the bottom of the
         | (short) readme.
        
         | Tomte wrote:
         | The author did not post it to HN to confuse you. He did not
         | post it here, at all.
         | 
         | Why are you entitled to have every single GitHub repo
         | explained, tailored to your individual knowledge?
         | 
         | Many other people understood exactly what this is.
         | 
         | Maybe the submitter could add a comment on HN with an
         | explanation, but the author owes you nothing.
        
           | bartread wrote:
           | Mmmmmmm... you have a valid point about entitlement and OSS
           | but I'm going to also agree with GP here. Not particularly
           | poking at this project, and the work that's gone into it, but
           | too many projects don't have a short paragraph explaining
           | what their purpose is and why it matters.
           | 
           | I'm not going to name names because I don't want to throw
           | shade at what are essentially good or even great projects
           | but, as a recent example, I encountered a library in our
           | codebase the other day where I simply didn't get what the
           | point was, and the corresponding project page and
           | documentation - whilst really detailed in some ways - didn't
           | help. In the end I asked ChatGPT and also found a series of
           | video tutorials that I watched at 1.75x speed to understand
           | it.
           | 
           | It was worth doing that because the thing is already used in
           | our codebase, and it's important in that context for me to
           | understand why and the value it adds.
           | 
           | But if I run across something reading an article or whatever,
           | and it mentions some library or project in passing, I'm semi-
           | regularly left a bit baffled as to what and why and I
           | probably don't have the time to go digging. Nowadays I
           | probably _would_ ask ChatGPT for a short summary because it
           | 's so convenient and it's often quicker than Googling, and
           | maybe I'll start submitting PRs against readme.md files to
           | add those summaries (with a bit of editing) to the beginning
           | of them.
        
         | edflsafoiewq wrote:
         | The doc comment at the top of the .py file is sufficiently
         | descriptive                   """Simple, minimal implementation
         | of Mamba in one file of Numpy adapted from (1) and inspired
         | from (2).              Suggest reading the following
         | before/while reading the code:             [1] Mamba: Linear-
         | Time Sequence Modeling with Selective State Spaces (Albert Gu
         | and Tri Dao)                 https://arxiv.org/abs/2312.00752
         | [2] The Annotated S4 (Sasha Rush and Sidd Karamcheti)
         | https://srush.github.io/annotated-s4
        
           | 1024core wrote:
           | > The doc comment at the top of the .py file is sufficiently
           | descriptive
           | 
           | Which is the purpose of these doc comments.
           | 
           | If you have the time to gripe on HN, you have the time to
           | click on the link and do some reading. The "Usage" section in
           | the link above is enough to help one disambiguate; if not,
           | then there's always the doc comment.
        
             | exe34 wrote:
             | i believe the gripe is a plea for others not to do the same
             | thing, and instead to put some thought into presentation.
             | the gripe is not about the specific case. if the poster is
             | anything like me, if the first ten words of your post don't
             | make sense to me, I'm just moving on to something else.
        
           | mint2 wrote:
           | No, I can see the commenters frustration. Unless one is
           | versed in Llm space, one is more likely to know mamba as the
           | package manager and find the headline and also the GitHub
           | page confusing. The markdown read me is supposed to provide
           | the info the commenter wanted.
           | 
           | Even that first line you posted is unhelpfully circular,
           | defining mamba as an implementation of mamba.
           | 
           | Call me old fashioned, but a best practice read me should
           | concisely provide: what the thing is, and why it is, aka the
           | problem it solves. (And not with circular definition.)
        
           | ktm5j wrote:
           | Okay, so why not just put that in the readme??
        
       | rowanG077 wrote:
       | Totally unclear what this is. I scrolled through the readme and
       | it didn't even mention once what it does.
        
         | wodenokoto wrote:
         | It totally mentions what it does. It takes the sentence "I have
         | a dream that" and extends it to: "I have a dream that I will be
         | able to see the sunrise in the morning."
         | 
         | It's an LLM.
        
           | szvsw wrote:
           | It's much more than just an LLM. The mamba architecture is
           | often used in the _backbone_ of an LLM but you can use it
           | more generally as a linear-time (as opposed to quadratic-
           | time) sequence modeling architecture (as per the original
           | paper's title, which is cited in the linked repo). It is much
           | closer to a convolutional network or an RNN (it has bits of
           | both) than to a transformer architecture. It is based off the
           | notion of state spaces (with a twist).
           | 
           | I use Mamba for instance to build surrogate models of
           | physics-based building energy models which can generate
           | 15-min interval data for heating, cooling, electricity, and
           | hot water usage of any building in the US from building
           | characteristics, weather timeseries, and occupancy time
           | series.
           | 
           | It has many other non-NLP applications.
        
             | Ddav wrote:
             | Would love to hear more about that building energy
             | modelling example, have you done a writeup you could share?
        
               | szvsw wrote:
               | The Mamba application is my current research project so I
               | haven't published anything yet. But the basic idea is to
               | create a latent representation of the static features,
               | repeat the latent vector to form a time series,
               | concatenate with the weather/occupancy time series, run
               | through mamba layers, and bob's your uncle. Shoot me an
               | email (in my bio) if you would like to chat more!
               | 
               | I can also share my master's thesis which is similar but
               | using CNN layers rather than Mamba and only for monthly
               | predictions rather than 15-min interval data. There are
               | some other architectural differences but the basics are
               | the same. That work is also globally robust.
               | 
               | As you can imagine, the current work I am doing at a much
               | higher resolution is a big step up, and Mamba so far is
               | working out great.
        
               | ahmadmijot wrote:
               | Can I see your thesis?
               | 
               | I'm currently learning about machine learning and digital
               | twin but don't really where to start
        
             | blagie wrote:
             | Is there a good, easy tutorial?
        
         | sva_ wrote:
         | It's an LLM architecture competing with transformers:
         | https://arxiv.org/abs/2312.00752
         | 
         | Proponents of it usually highlight it's inference performance,
         | in particular linear scaling with the input tokens.
        
           | szvsw wrote:
           | I really disagree with pigeonholing it as an LLM
           | architecture! It is much more general than that as I
           | mentioned in another comment in this post [1] (and of course
           | as mentioned in the original paper which you linked).
           | 
           | [1] https://news.ycombinator.com/item?id=40616181
        
         | piqufoh wrote:
         | Completely - I assumed it was an implementation of
         | https://github.com/mamba-org/mamba
         | 
         | I also assumed that "a pure NumPy implementation" meant that it
         | was built purely with numpy, which it isn't smh
        
       | Hugsun wrote:
       | Is there a benefit to implementing it in numpy over pytorch or
       | tf?
        
         | blagie wrote:
         | Yes.
         | 
         | A numpy program will work tomorrow.
         | 
         | ALL of the machine learning frameworks have incredible churn. I
         | have code from two years ago which I can't make work reliably
         | anymore -- not for lack of trying -- due to all the breaking
         | changes and dependency issues. There are systems where each
         | model runs in its own docker, with its own set of pinned
         | library versions (many with security issues now). It's a
         | complete and utter trainwreck. Don't even get me started on
         | CUDA versions (or Intel/AMD compatibility, or older /
         | deprecated GPUs).
         | 
         | For comparison, virtually all of my non-machine-learning Python
         | code from the year 2010 all still works in 2024.
         | 
         | There are good reasons for this. Those breaking changes aren't
         | just for fun; they're representative of the very rapid rate of
         | progress in a rapidly-changing field. In contrast, Python or
         | numpy are mature systems. Still, it makes many machine learning
         | models insanely expensive to maintain in production
         | environments.
         | 
         | If you're a machine learning researcher, it's fine, but if you
         | have a system like an ecommerce web site or a compiler or
         | whatever, where you'd like to be able to plug in a task-
         | specific ML model, your downpayment is a weekend of hacking to
         | make it work, but your ongoing rent of maintenance costs might
         | be a few weeks each year for each model you use. I have a
         | million places I'd love to plug in a little bit of ML. However,
         | I'm very judicious with it, not because it's hard to do, but
         | because it's expensive to maintain.
         | 
         | A pure Python + numpy implementation would mean that you can
         | avoid all of that.
        
       | ptspts wrote:
       | Contrary to what the title says, this is note a pure Pyhon +
       | numpy implementation: in fact it also imports einops,
       | transformers and torch.
       | 
       | For me pure X means: to use this, all you have to install is X.
        
         | qwertox wrote:
         | I was struggling with the fairness of your comment because the
         | libraries are not used as a replacement to NumPy, but to ease
         | dealing with the data. This made me check and it turns out
         | that:
         | 
         | "Yes, the comment you mentioned is fair and reflects a common
         | perspective in the programming and data science communities
         | regarding the usage of "pure" implementations. When someone
         | refers to a "pure X implementation," the typical expectation is
         | that the implementation will rely solely on the functionalities
         | of library X, without introducing dependencies from other
         | libraries or frameworks."
         | 
         | TIL.
        
         | rsfern wrote:
         | I don't see a PyTorch import, and the transformers import is
         | just for the tokenizer which I don't really consider a
         | nontrivial part of mamba
         | 
         | So it's just numpy and einops, which is pretty cool. I guess
         | you could probably rewrite all the einops stuff in pure numpy
         | if you want to trade readable code for eliminating the einops
         | dependency
         | 
         | Edit: found the torch import, but it's just for a single
         | torch.load to deserialize some data
        
           | TeMPOraL wrote:
           | > _Edit: found the torch import, but it's just for a single
           | torch.load to deserialize some data_
           | 
           | Torch is quite heavy though, isn't it? All for that one
           | deserialization call?
        
         | nobodywillobsrv wrote:
         | Yes exactly. It's classic mis-selling.
        
       | mint2 wrote:
       | Okay so is Mamba also an llm? There's too much name overloading!
       | 
       | I'm familiar with mamba, the conda like thing in python, but a
       | numpy implementation of that makes no sense.
        
         | nerdponx wrote:
         | I'm not normally one to gripe about name conflicts, but I knew
         | this was going to get confusing. You could install an
         | implementation of the Mamba LLM using the Mamba package
         | manager!
        
         | exe34 wrote:
         | maybe we should get rid of names entirely, and call it all
         | software. "software: a super fast and fun retro-encabulator
         | written in rust!"
        
           | volemo wrote:
           | Assign UUIDs to everything!
        
       ___________________________________________________________________
       (page generated 2024-06-08 23:01 UTC)