[HN Gopher] The Annotated S4
___________________________________________________________________
The Annotated S4
Author : profchemai
Score : 79 points
Date : 2024-02-12 15:06 UTC (7 hours ago)
(HTM) web link (srush.github.io)
(TXT) w3m dump (srush.github.io)
| imjonse wrote:
| A lot of intimidating math that will make all self-attention
| tutorials seem like a walk in the park in comparison. Luckily
| subsequent state space models building on S4 (DSS, S4D and newer
| ones like Mamba) simplified the primitives and the math used.
| marmaduke wrote:
| The math is not designed to intimidate but rather approach the
| "how to build sequence model" in a principled way from state
| space models, which draws from an arguably longer literature
| than neural networks.
|
| Some of concepts are better explained here than anywhere else,
| and make it straightforward to make sense of Mamba, which is
| increasingly popular.
| imjonse wrote:
| I did not mean it in a negative way, this is a great
| resource. But the math will be intimidating regardless for
| most devs who don't have a solid math/signal processing
| background. It's way beyond the simple linear algebra plus
| chain rule from calculus that are required to understand
| basic neural networks training.
| ptojr wrote:
| Can someone point me to DSS and S4D papers?
| efm wrote:
| DSS: Diagonal State Spaces are as Effective as Structured
| State Spaces (https://arxiv.org/abs/2203.14343)
|
| S4D: On the Parameterization and Initialization of Diagonal
| State Space Models (https://arxiv.org/abs/2206.11893)
| ssivark wrote:
| [delayed]
| adamnemecek wrote:
| All machine learning is convolution.
| srush wrote:
| Hi! Blog author. This was an attempt a couple years ago to
| understand and write about this paper in a detailed way. Here is
| a video going through this topic as well:
| https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb
|
| Section 2 of the blog post is no longer very relevant. A lot of
| advances (DSS, S4D) simplified that part of the process. Arguably
| also this all should be updated for Mamba (same authors).
| radarsat1 wrote:
| This was an excellent write up thanks. It'll help me understand
| the Mamba work a lot more.
|
| I still find it really confusing how a linear model can perform
| so well.
| jwuphysics wrote:
| Thanks for your spectacular resources! I see that you began an
| Annotated Mamba repository -- any chance you could share when
| that blog page might go live?
| medv wrote:
| What I need so learn to start to understand those articles? Is
| there some good courses on the topic? For beginners?
___________________________________________________________________
(page generated 2024-02-12 23:00 UTC)