https://arxiv.org/abs/2404.16112 Skip to main content Cornell University We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate arxiv logo > cs > arXiv:2404.16112 [ ] Help | Advanced Search [All fields ] Search arXiv logo Cornell University Logo [ ] GO quick links * Login * Help Pages * About Computer Science > Machine Learning arXiv:2404.16112 (cs) [Submitted on 24 Apr 2024] Title:Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Authors:Badri Narayana Patro, Vijay Srinivas Agneeswaran View a PDF of the paper titled Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, by Badri Narayana Patro and 1 other authors View PDF HTML (experimental) Abstract:Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url {this https URL}. Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV) Cite as: arXiv:2404.16112 [cs.LG] (or arXiv:2404.16112v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2404.16112 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Badri Narayana Patro [view email] [v1] Wed, 24 Apr 2024 18:10:31 UTC (4,821 KB) Full-text links: Access Paper: View a PDF of the paper titled Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges, by Badri Narayana Patro and 1 other authors * View PDF * HTML (experimental) * TeX Source * Other Formats license icon view license Current browse context: cs.LG < prev | next > new | recent | 2404 Change to browse by: cs cs.AI cs.CV cs.MM eess eess.IV References & Citations * NASA ADS * Google Scholar * Semantic Scholar a export BibTeX citation Loading... BibTeX formatted citation x [loading... ] Data provided by: Bookmark BibSonomy logo Reddit logo (*) Bibliographic Tools Bibliographic and Citation Tools [ ] Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) [ ] Litmaps Toggle Litmaps (What is Litmaps?) [ ] scite.ai Toggle scite Smart Citations (What are Smart Citations?) ( ) Code, Data, Media Code, Data and Media Associated with this Article [ ] Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) [ ] DagsHub Toggle DagsHub (What is DagsHub?) [ ] GotitPub Toggle Gotit.pub (What is GotitPub?) [ ] Links to Code Toggle Papers with Code (What is Papers with Code?) [ ] ScienceCast Toggle ScienceCast (What is ScienceCast?) ( ) Demos Demos [ ] Replicate Toggle Replicate (What is Replicate?) [ ] Spaces Toggle Hugging Face Spaces (What is Spaces?) [ ] Spaces Toggle TXYZ.AI (What is TXYZ.AI?) ( ) Related Papers Recommenders and Search Tools [ ] Link to Influence Flower Influence Flower (What are Influence Flowers?) [ ] Connected Papers Toggle Connected Papers (What is Connected Papers?) [ ] Core recommender toggle CORE Recommender (What is CORE?) [ ] IArxiv recommender toggle IArxiv Recommender (What is IArxiv?) * Author * Venue * Institution * Topic ( ) About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) * About * Help * Click here to contact arXiv Contact * Click here to subscribe Subscribe * Copyright * Privacy Policy * Web Accessibility Assistance * arXiv Operational Status Get status notifications via email or slack