[HN Gopher] The Annotated S4
       ___________________________________________________________________
        
       The Annotated S4
        
       Author : profchemai
       Score  : 79 points
       Date   : 2024-02-12 15:06 UTC (7 hours ago)
        
 (HTM) web link (srush.github.io)
 (TXT) w3m dump (srush.github.io)
        
       | imjonse wrote:
       | A lot of intimidating math that will make all self-attention
       | tutorials seem like a walk in the park in comparison. Luckily
       | subsequent state space models building on S4 (DSS, S4D and newer
       | ones like Mamba) simplified the primitives and the math used.
        
         | marmaduke wrote:
         | The math is not designed to intimidate but rather approach the
         | "how to build sequence model" in a principled way from state
         | space models, which draws from an arguably longer literature
         | than neural networks.
         | 
         | Some of concepts are better explained here than anywhere else,
         | and make it straightforward to make sense of Mamba, which is
         | increasingly popular.
        
           | imjonse wrote:
           | I did not mean it in a negative way, this is a great
           | resource. But the math will be intimidating regardless for
           | most devs who don't have a solid math/signal processing
           | background. It's way beyond the simple linear algebra plus
           | chain rule from calculus that are required to understand
           | basic neural networks training.
        
         | ptojr wrote:
         | Can someone point me to DSS and S4D papers?
        
           | efm wrote:
           | DSS: Diagonal State Spaces are as Effective as Structured
           | State Spaces (https://arxiv.org/abs/2203.14343)
           | 
           | S4D: On the Parameterization and Initialization of Diagonal
           | State Space Models (https://arxiv.org/abs/2206.11893)
        
         | ssivark wrote:
         | [delayed]
        
       | adamnemecek wrote:
       | All machine learning is convolution.
        
       | srush wrote:
       | Hi! Blog author. This was an attempt a couple years ago to
       | understand and write about this paper in a detailed way. Here is
       | a video going through this topic as well:
       | https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb
       | 
       | Section 2 of the blog post is no longer very relevant. A lot of
       | advances (DSS, S4D) simplified that part of the process. Arguably
       | also this all should be updated for Mamba (same authors).
        
         | radarsat1 wrote:
         | This was an excellent write up thanks. It'll help me understand
         | the Mamba work a lot more.
         | 
         | I still find it really confusing how a linear model can perform
         | so well.
        
         | jwuphysics wrote:
         | Thanks for your spectacular resources! I see that you began an
         | Annotated Mamba repository -- any chance you could share when
         | that blog page might go live?
        
       | medv wrote:
       | What I need so learn to start to understand those articles? Is
       | there some good courses on the topic? For beginners?
        
       ___________________________________________________________________
       (page generated 2024-02-12 23:00 UTC)