[HN Gopher] Defining Statistical Models in Jax?
___________________________________________________________________
Defining Statistical Models in Jax?
Author : hackandthink
Score : 99 points
Date : 2024-10-13 15:20 UTC (4 days ago)
(HTM) web link (statmodeling.stat.columbia.edu)
(TXT) w3m dump (statmodeling.stat.columbia.edu)
| JHonaker wrote:
| I'm very excited by the work being put in to make Bayesian
| inference more manageable. It's in a spot that feels very similar
| to deep learning circa mid-2010s when Caffe, Torch, and hand-
| written gradients were the options. We _can_ do it, but doing
| anything more complicated than common model structures like
| hierarchical Gaussian linear models requires dropping out of the
| nice places and into the guts.
|
| I've had a lot of success with Numpyro (a JAX library), and used
| quite a lot of tools that are simpler interfaces to Stan. I've
| also had to write quite a few model-specific things from scratch
| by hand (more for sequential Monte Carlo than MCMC). I'm very
| excited for a world where PPLs become scalable and easier to use
| /customize.
|
| > I think there is a good chance that normalizing flow-based
| variational inference will displace MCMC as the go-to method for
| Bayesian posterior inference as soon as everyone gets access to
| good GPUs.
|
| Wow. This is incredibly surprising. I'm only tangentially aware
| of normalizing flows, but apparently I need to look at the
| intersection of them and Bayesian statistics now! Any sources
| from anyone would be most appreciated!
| sarosh wrote:
| Defer to other experts, but (briefly) normalizing flows are a
| method for constructing complex distributions by transforming a
| probability density through a series of invertible
| transformations. Normalizing flows are trained using a plain
| log-likelihood function, and they are capable of exact density
| evaluation and efficient sampling. See:
|
| Danilo Rezende and Shakir Mohamed. Variational inference with
| normalizing flows. In ICML, 2015. Link:
| https://bigdata.duke.edu/wp-content/uploads/2022/08/1505.057...
|
| Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-
| linear independent components estimation. In ICLR Workshop,
| 2015. Link: https://arxiv.org/pdf/1410.8516
|
| And for your direct question, the following paper "Efficient
| Bayesian Sampling Using Normalizing Flows to Assist Markov
| Chain Monte Carlo Methods" appears upon a superficial glance to
| be relevant. Link: https://arxiv.org/pdf/2107.08001
| JHonaker wrote:
| Thanks! I've read the first one before. I'll take a look at
| the other two!
| 1980phipsi wrote:
| So it's like converting a normal distribution to log normal
| (and then back). But a more general way of thinking about it.
|
| Where does the name "normalizing flows" come from?
| hotstickyballs wrote:
| It comes from the Jacobian which you can get from auto
| diff. It measures how much distortion the function created
| and normalizes it so that you can integrate correctly
| without blowing up gradients
| legobmw99 wrote:
| The author links to https://arxiv.org/abs/2006.10343, which
| seems like a good place to start on normalizing flows for Bayes
| nextos wrote:
| Pyro has a nice normalizing flows tutorial:
| https://pyro.ai/examples/normalizing_flows_intro.html
| JHonaker wrote:
| Ah, I did not realize that the `realNVP` was a link! Thanks.
| sampo wrote:
| https://statmodeling.stat.columbia.edu/2024/10/08/defining-s...
|
| would be better link than (currently) posted
|
| https://statmodeling.stat.columbia.edu/2024/10/08/defining-s...
| techwizrd wrote:
| This is coming at the perfect time! I was recently trying to
| decide whether I wanted to implement a model in Stan or
| Pyro/Numpyro, and I've been eyeing implementing in JAX. I would
| love to write a tutorial comparing Stan to Jax.
| helltone wrote:
| Off topic: I think there's some opportunities for making bayesian
| inference technology more accessible, and I'd love to chat with
| other people in this space. Email in my profile.
| gnulinux wrote:
| Reading this post, and reviewing the documentation of
| NumPyro/Pyro, I think I'm not following the crucial difference
| between NumPyro/Pyro. I understand that Pyro uses PyTorch as
| backend, and NumPyro uses JAX as backend, but other than that I'm
| not sure about the critical differences. If their frontend is
| about the same (which seems to be the case here) why is JAX
| mentioned in this post? Could we simply not replace Pyro with
| Stan for statistical modelling (whether with PyTorch or JAX
| backend)?
| nextos wrote:
| > Could we simply not replace Pyro with Stan for statistical
| modelling (whether with PyTorch or JAX backend)?
|
| Stan has a fantastic NUTS Monte Carlo implementation. Pyro &
| NumPyro are more focused on variational inference. For a third
| alternatively that IMHO doesn't get the attention it deserves,
| take a look at Infer.NET, which excels at expectation
| propagation and uses factor graphs underneath. These three
| offer very different tradeoffs.
|
| Stan is less expressive than Pyro/NumPyro. But for the models
| it can deal with (generally medium-sized multi-level models), I
| find it extremely easy to work with. In other words, it's much
| easier to diagnose model and sampling issues.
| Iwan-Zotow wrote:
| this is great development!
| Myrmornis wrote:
| I'm curious about the involvement of tech companies here.
| Obviously approximating posterior distributions of explicit
| statistical models via simulation techniques is common in
| academic scientific literature but I'd like to hear about
| examples of it being done in "production" settings, i.e. not just
| as a one-off analysis. I have for a long time had a vague belief
| that in production settings people usually opt for heuristics /
| point estimates etc but I haven't had much involvement with this
| sort of thing for a while.
| nextos wrote:
| Pyro was created by Uber AI Labs. Actually, by Geometric
| Intelligence, which was eventually acquired by Uber. Geometric
| Intelligence was founded by Gary Marcus, Zoubin Ghahramani and
| others. They also had Noah Goodman onboard.
|
| AFAIK, Pyro was used in production to make predictions of
| demand with careful consideration of uncertainty. I was
| contacted by one of their recruiters when I was doing work in
| this area, and this was the application they showcased.
|
| Meta is also doing a lot of related work on time series
| forecasting using Prophet, which employs Stan under the hood.
| In both cases, Bayesian methods are important to make inference
| robust, it's not just an academic exercise.
___________________________________________________________________
(page generated 2024-10-17 23:00 UTC)