[HN Gopher] An Introduction to Probabilistic Programming
___________________________________________________________________
An Introduction to Probabilistic Programming
Author : homarp
Score : 91 points
Date : 2021-10-21 06:42 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| platz wrote:
| is Probabilistic Programming the same thing as doing MCMC (Markov
| chain Monte Carlo)? How do these two ideas relate? Or is one a
| subset of the other?
| craigacp wrote:
| Probabilistic programming can be done via MCMC approaches, but
| you can also infer the necessary quantities by using
| variational inference (which approximates the distribution
| described by your program with something that's simpler and
| easier to estimate).
|
| Basically probabilistic programming is a way of describing a
| distribution, and then MCMC is one way of inferring the
| quantities in that distribution.
| yt-sdb wrote:
| Probabilistic programming uses computer science techniques to
| do automated statistical modeling. For example, imagine I have
| a coin, and I want to discover if it is biased, i.e. if it
| lands on heads more often than tails. In a probabilistic
| programming framework, I can express my model as a simple
| Bernoulli model, `x ~ Bernoulli(p)`, and then automatically
| estimate the bias parameter `p` given some data (do
| "inference").
|
| You can easily do this calculation by hand or in Python, but
| this does not generalize to more complex real-world scenarios.
| For complex probabilistic models, we must rely on numerical
| approximations. MCMC is just one algorithm for doing this
| approximate inference. Another popular technique is called
| variational inference [2]. Another commenter mentioned HMC [3],
| which is just a specific instance of MCMC.
|
| [1] https://mc-stan.org/
|
| [2] https://arxiv.org/abs/1601.00670
|
| [3] https://arxiv.org/abs/1206.1901
| shoo wrote:
| I'd argue that probabilistic programming is a language or
| framework of programming that lets you easily build and then
| fit probabilistic models to estimate or predict things of
| interest.
|
| If you're taking a Bayesian approach to statistical modelling
| and inference, then they're probably a fairly good tool to
| consider. With the Bayesian approach you're trying to compute
| some posterior probability distribution that summarises your
| prior information (this might capture domain knowledge,
| information from related studies) and information from
| observations.
|
| There are different ways to compute a posterior distribution.
| In very simple or contrived cases you might be able to manually
| grind out an answer analytically with pen and paper and lots of
| algebra and integrals. But that isn't very efficient or
| scalable. MCMC can be used to estimate the integrals you need.
| MCMC isn't the only way to estimate or approximate these
| calculations -- e.g. another approach is variational inference
| where a bunch of approximations are introduced to replace the
| original calculation with an approximation that is easier to
| compute -- this likely introduces bias into the results but can
| give you something that can then be solved analytically or semi
| analytically (e.g. approximate everything as Gaussian
| distributions and a lot of integration collapses to efficiently
| computable algebraic identities).
|
| Some probabilistic programming platforms like Stan let you
| define your probabilistic model and parameters and decouple it
| from the computational backend used to estimate the posterior
| distribution. E.g. in Stan you can switch the computational
| backend between MCMC (https://mc-stan.org/docs/2_18/stan-users-
| guide/sampling-diff...) and ADVI (auto-differentiation
| variational inference).
|
| MCMC has practical problems in that it is only guaranteed to
| give you the correct (unbiased) estimate asymptotically, in the
| limit if you run it for an infinite amount of time. If you're
| trying to approximate the integral of a function that is very
| multi-modal -- where it would be difficult for a global
| optimisation algorithm to locate the global optima -- then MCMC
| will likely also struggle to produce a good estimate. MCMC is
| difficult to parallelise effectively as the algorithm is
| inherently like an iterative local search procedure -- the next
| state in the chain is some mutation of the previous state. You
| can run n MCMC chains in parallel from n different initial
| configurations, but it's not obvious that you'll get a better
| estimate from n short chains vs a single long chain -- the
| longer a chain runs, the more chance it has of being able to
| discover and explore higher probability (more realistic, more
| plausible) configurations of the parameter space.
|
| MCMC isn't only used for probabilistic programming, you can
| apply it for other things. E.g. it gets used in material
| science to study statistical properties of molecular dynamics
| simulations etc.
| FuriouslyAdrift wrote:
| I think, and this is WAY outside my area, is that Hamiltonian
| Monte Carlo (a Markov chain monte carlo method) is used. Beyond
| that and I'm lost.
| wodenokoto wrote:
| Has probabilistic programming been shown to solve problems better
| than machine learning or deep learning approaches?
|
| I remember it being pretty hyped 5-6 years ago ...
| vibrio_bobtail wrote:
| I wouldn't contrast it with machine learning or deep learning.
| Probabilistic programming is focused on building languages or
| libraries that incorporate fundamental probabilistic building
| blocks, model building statements and inference strategies as
| first class citizens within the language. Its been remarkably
| successful, check out Stan, Pyro, PyMC, Tensorflow Probability,
| JAGS, BUGS and Turing as very successful projects that have
| been used to tackle challenging and diverse problems with
| probabilistic modeling . The more modern probabilistic
| programming languages are actually designed to incorporate the
| advances of deep learning by making it easy to embed ANNs into
| models by using them to parameterize random variables, Pyro and
| Tensorflow Probability are probably the best examples of this.
| medo-bear wrote:
| note that these are not exclusive. you could divide ML into a
| traditional statistical approach and a probabilistic one that
| is concerned with deriving the underlying probability
| distribution. probabilistic programming is kind of like a
| domain specific language for achieving this. there is also
| differential programming that works on the same principle.
| there are certainly industrial usages of this paradigm. look up
| pyro (http://pyro.ai/examples/intro_part_i.html) for ppl and
| jax (https://github.com/google/jax) for differential
| programming
| joe_the_user wrote:
| I like this passage: _In this and the next two chapters of this
| introduction we will present the key ideas of probabilistic
| programming using a carefully designed first-order probabilistic
| programming language (FOPPL). The FOPPL includes most common
| features of programming languages, such as conditional statements
| (e.g. if), primitive operations (e.g. +,-, etc.), and user-
| defined functions. The restrictions that we impose are that
| functions must be first order, which is to say that functions
| cannot accept other functions as arguments, and that they cannot
| be recursive. These two restrictions result in a language where
| models describe distributions over a finite number of random
| variables. In terms of expressivity, this places the FOPPL on
| even footing with many existing languages and libraries for
| automating inference in graphical models with finite graphs._
|
| This gives a nice picture of what's happening. At the same time,
| does this mean that in the end, you're basically operating on a
| single distribution with only a few canned global
| transformations?
| medo-bear wrote:
| nice! also interesting that they have chosen a lisp syntax for
| the book. note however that the book is written to be language
| agnostic
|
| > It is a Lisp-like language which, by virtue of its syntactic
| simplicity, also makes for efficient and easy meta-programming,
| an approach many implementors will take. That said, the real
| substance of this book is language agnostic and the main points
| should be understood in this light.
| otiose_tortoise wrote:
| If you're interested in probabilistic programming and want
| something a little more hands-on, I recommend The Design and
| Implementation of Probabilistic Programming Languages
| http://dippl.org/ . It's an online course/textbook that gets you
| programming right away and makes the power of probabilistic
| programming immediately clear.
| medo-bear wrote:
| thanks for this. there is also Probabilistic Models of
| Cognition [0] by one of the authors. I wish however that they
| stuck to Church language [1]
|
| [0] https://probmods.org/
|
| [1]
| http://web.stanford.edu/~ngoodman/papers/POPL2013-abstract.p...
| otiose_tortoise wrote:
| I too preferred church, though I understand why the authors
| of dippl chose a more popular language (javascript) for their
| book. Church (a dialect of Scheme, which is a dialect of
| Lisp) also ties probabilistic programming back to its
| intellectual roots in McCarthy's amb operator [1].
|
| That said, you can get pretty far with probabilistic
| programming in any language with decent monad support [2]. I
| did most of my probabilistic programming work in Scala. (You
| lose the ability to do really fancy inference if you go the
| monad route, as you can't analyze the program structure, but
| a lot of the time, this is fine.)
|
| [1] http://community.schemewiki.org/?amb
|
| [2] http://mlg.eng.cam.ac.uk/pub/pdf/SciGhaGor15.pdf
| wodenokoto wrote:
| I wish I could have Church as a Jupyter notebook/lab Kernel.
|
| Would make it much easier to play around with the language
| when trying to wrap my mind around the church version of
| probmods.
| fnord77 wrote:
| does this require stats/probability knowledge?
|
| on that topic, can anyone recommend an online stats/probability
| course? I tried the coursera one by Sebastian Thrun and
| couldn't get far into it because the "TA" examples were
| unintelligible.
| medo-bear wrote:
| for machine learning i can recommend the course by Philipp
| Henigg from Uni of Tuebingen
|
| https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m.
| ..
| tbarringer816 wrote:
| Stat110[0] (Harvard) on EdX! The professor, Joe Blitzstein,
| is incredible. Easily one of the best classes I have ever
| taken.
|
| [0] https://www.edx.org/course/introduction-to-probability
| kristjansson wrote:
| NB updated as of two days ago, despite the 1809 title.
___________________________________________________________________
(page generated 2021-10-22 23:00 UTC)