https://www.sscardapane.it/alice-book

Skip links

  * Skip to primary navigation
  * Skip to content
  * Skip to footer

Simone Scardapane

  * Bio
  * Teaching
  * Thesis
  * Publications
  * Talks
  * Miscellaneous
  * Blog
  * Grants

Toggle search Toggle menu

[Alice]

Book: Alice's Adventures in a differentiable wonderland

Neural networks surround us, in the form of large language models,
speech transcription systems, molecular discovery algorithms,
robotics, and much more. Stripped of anything else, neural networks
are compositions of differentiable primitives, and studying them
means learning how to program and how to interact with these models,
a particular example of what is called differentiable programming.

This primer is an introduction to this fascinating field imagined for
someone, like Alice, who has just ventured into this strange
differentiable wonderland. I overview the basics of optimizing a
function via automatic differentiation, and a selection of the most
common designs for handling sequences, graphs, texts, and audios. The
focus is on a intuitive, self-contained introduction to the most
important design techniques, including convolutional, attentional,
and recurrent blocks, hoping to bridge the gap between theory and
code (PyTorch and JAX) and leaving the reader capable of
understanding some of the most advanced models out there, such as
large language models (LLMs) and multimodal architectures.

Table of contents

The book is currently in a draft form and available for feedback and
beta reading from arXiv: arXiv preprint 2404.17625.

 1. Foreword and introduction
 2. Mathematical preliminaries
 3. Datasets and losses
 4. Linear models
 5. Fully-connected layers
 6. Automatic differentiation
 7. Convolutive layers
 8. Convolutions beyond images
 9. Scaling up models
10. Transformer models
11. Transformers in practice
12. Graph layers
13. Recurrent layers
14. Appendix A: Probability theory
15. Appendix B: Universal approximation in 1D

Additional chapters

I will publish here additional chapters on more advanced material
that I could not fit into the first volume. Eventually, I hope these
will be part of a second volume. More probably, they will languish
here forever.

 1. Model re-use (including parameter-efficient fine-tuning and model
    merging).
 2. Density estimation and generative modelling.
 3. Conditional computation (mixture-of-experts, early exits).
 4. Metric and self-supervised learning.
 5. Debugging and understanding the models.

  * Twitter
  * LinkedIn
  * GitHub

(c) 2024 Simone Scardapane. Powered by Jekyll & Minimal Mistakes.