[HN Gopher] An Introduction to Probabilistic Programming
       ___________________________________________________________________
        
       An Introduction to Probabilistic Programming
        
       Author : homarp
       Score  : 91 points
       Date   : 2021-10-21 06:42 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | platz wrote:
       | is Probabilistic Programming the same thing as doing MCMC (Markov
       | chain Monte Carlo)? How do these two ideas relate? Or is one a
       | subset of the other?
        
         | craigacp wrote:
         | Probabilistic programming can be done via MCMC approaches, but
         | you can also infer the necessary quantities by using
         | variational inference (which approximates the distribution
         | described by your program with something that's simpler and
         | easier to estimate).
         | 
         | Basically probabilistic programming is a way of describing a
         | distribution, and then MCMC is one way of inferring the
         | quantities in that distribution.
        
         | yt-sdb wrote:
         | Probabilistic programming uses computer science techniques to
         | do automated statistical modeling. For example, imagine I have
         | a coin, and I want to discover if it is biased, i.e. if it
         | lands on heads more often than tails. In a probabilistic
         | programming framework, I can express my model as a simple
         | Bernoulli model, `x ~ Bernoulli(p)`, and then automatically
         | estimate the bias parameter `p` given some data (do
         | "inference").
         | 
         | You can easily do this calculation by hand or in Python, but
         | this does not generalize to more complex real-world scenarios.
         | For complex probabilistic models, we must rely on numerical
         | approximations. MCMC is just one algorithm for doing this
         | approximate inference. Another popular technique is called
         | variational inference [2]. Another commenter mentioned HMC [3],
         | which is just a specific instance of MCMC.
         | 
         | [1] https://mc-stan.org/
         | 
         | [2] https://arxiv.org/abs/1601.00670
         | 
         | [3] https://arxiv.org/abs/1206.1901
        
         | shoo wrote:
         | I'd argue that probabilistic programming is a language or
         | framework of programming that lets you easily build and then
         | fit probabilistic models to estimate or predict things of
         | interest.
         | 
         | If you're taking a Bayesian approach to statistical modelling
         | and inference, then they're probably a fairly good tool to
         | consider. With the Bayesian approach you're trying to compute
         | some posterior probability distribution that summarises your
         | prior information (this might capture domain knowledge,
         | information from related studies) and information from
         | observations.
         | 
         | There are different ways to compute a posterior distribution.
         | In very simple or contrived cases you might be able to manually
         | grind out an answer analytically with pen and paper and lots of
         | algebra and integrals. But that isn't very efficient or
         | scalable. MCMC can be used to estimate the integrals you need.
         | MCMC isn't the only way to estimate or approximate these
         | calculations -- e.g. another approach is variational inference
         | where a bunch of approximations are introduced to replace the
         | original calculation with an approximation that is easier to
         | compute -- this likely introduces bias into the results but can
         | give you something that can then be solved analytically or semi
         | analytically (e.g. approximate everything as Gaussian
         | distributions and a lot of integration collapses to efficiently
         | computable algebraic identities).
         | 
         | Some probabilistic programming platforms like Stan let you
         | define your probabilistic model and parameters and decouple it
         | from the computational backend used to estimate the posterior
         | distribution. E.g. in Stan you can switch the computational
         | backend between MCMC (https://mc-stan.org/docs/2_18/stan-users-
         | guide/sampling-diff...) and ADVI (auto-differentiation
         | variational inference).
         | 
         | MCMC has practical problems in that it is only guaranteed to
         | give you the correct (unbiased) estimate asymptotically, in the
         | limit if you run it for an infinite amount of time. If you're
         | trying to approximate the integral of a function that is very
         | multi-modal -- where it would be difficult for a global
         | optimisation algorithm to locate the global optima -- then MCMC
         | will likely also struggle to produce a good estimate. MCMC is
         | difficult to parallelise effectively as the algorithm is
         | inherently like an iterative local search procedure -- the next
         | state in the chain is some mutation of the previous state. You
         | can run n MCMC chains in parallel from n different initial
         | configurations, but it's not obvious that you'll get a better
         | estimate from n short chains vs a single long chain -- the
         | longer a chain runs, the more chance it has of being able to
         | discover and explore higher probability (more realistic, more
         | plausible) configurations of the parameter space.
         | 
         | MCMC isn't only used for probabilistic programming, you can
         | apply it for other things. E.g. it gets used in material
         | science to study statistical properties of molecular dynamics
         | simulations etc.
        
         | FuriouslyAdrift wrote:
         | I think, and this is WAY outside my area, is that Hamiltonian
         | Monte Carlo (a Markov chain monte carlo method) is used. Beyond
         | that and I'm lost.
        
       | wodenokoto wrote:
       | Has probabilistic programming been shown to solve problems better
       | than machine learning or deep learning approaches?
       | 
       | I remember it being pretty hyped 5-6 years ago ...
        
         | vibrio_bobtail wrote:
         | I wouldn't contrast it with machine learning or deep learning.
         | Probabilistic programming is focused on building languages or
         | libraries that incorporate fundamental probabilistic building
         | blocks, model building statements and inference strategies as
         | first class citizens within the language. Its been remarkably
         | successful, check out Stan, Pyro, PyMC, Tensorflow Probability,
         | JAGS, BUGS and Turing as very successful projects that have
         | been used to tackle challenging and diverse problems with
         | probabilistic modeling . The more modern probabilistic
         | programming languages are actually designed to incorporate the
         | advances of deep learning by making it easy to embed ANNs into
         | models by using them to parameterize random variables, Pyro and
         | Tensorflow Probability are probably the best examples of this.
        
         | medo-bear wrote:
         | note that these are not exclusive. you could divide ML into a
         | traditional statistical approach and a probabilistic one that
         | is concerned with deriving the underlying probability
         | distribution. probabilistic programming is kind of like a
         | domain specific language for achieving this. there is also
         | differential programming that works on the same principle.
         | there are certainly industrial usages of this paradigm. look up
         | pyro (http://pyro.ai/examples/intro_part_i.html) for ppl and
         | jax (https://github.com/google/jax) for differential
         | programming
        
       | joe_the_user wrote:
       | I like this passage: _In this and the next two chapters of this
       | introduction we will present the key ideas of probabilistic
       | programming using a carefully designed first-order probabilistic
       | programming language (FOPPL). The FOPPL includes most common
       | features of programming languages, such as conditional statements
       | (e.g. if), primitive operations (e.g. +,-, etc.), and user-
       | defined functions. The restrictions that we impose are that
       | functions must be first order, which is to say that functions
       | cannot accept other functions as arguments, and that they cannot
       | be recursive. These two restrictions result in a language where
       | models describe distributions over a finite number of random
       | variables. In terms of expressivity, this places the FOPPL on
       | even footing with many existing languages and libraries for
       | automating inference in graphical models with finite graphs._
       | 
       | This gives a nice picture of what's happening. At the same time,
       | does this mean that in the end, you're basically operating on a
       | single distribution with only a few canned global
       | transformations?
        
       | medo-bear wrote:
       | nice! also interesting that they have chosen a lisp syntax for
       | the book. note however that the book is written to be language
       | agnostic
       | 
       | > It is a Lisp-like language which, by virtue of its syntactic
       | simplicity, also makes for efficient and easy meta-programming,
       | an approach many implementors will take. That said, the real
       | substance of this book is language agnostic and the main points
       | should be understood in this light.
        
       | otiose_tortoise wrote:
       | If you're interested in probabilistic programming and want
       | something a little more hands-on, I recommend The Design and
       | Implementation of Probabilistic Programming Languages
       | http://dippl.org/ . It's an online course/textbook that gets you
       | programming right away and makes the power of probabilistic
       | programming immediately clear.
        
         | medo-bear wrote:
         | thanks for this. there is also Probabilistic Models of
         | Cognition [0] by one of the authors. I wish however that they
         | stuck to Church language [1]
         | 
         | [0] https://probmods.org/
         | 
         | [1]
         | http://web.stanford.edu/~ngoodman/papers/POPL2013-abstract.p...
        
           | otiose_tortoise wrote:
           | I too preferred church, though I understand why the authors
           | of dippl chose a more popular language (javascript) for their
           | book. Church (a dialect of Scheme, which is a dialect of
           | Lisp) also ties probabilistic programming back to its
           | intellectual roots in McCarthy's amb operator [1].
           | 
           | That said, you can get pretty far with probabilistic
           | programming in any language with decent monad support [2]. I
           | did most of my probabilistic programming work in Scala. (You
           | lose the ability to do really fancy inference if you go the
           | monad route, as you can't analyze the program structure, but
           | a lot of the time, this is fine.)
           | 
           | [1] http://community.schemewiki.org/?amb
           | 
           | [2] http://mlg.eng.cam.ac.uk/pub/pdf/SciGhaGor15.pdf
        
           | wodenokoto wrote:
           | I wish I could have Church as a Jupyter notebook/lab Kernel.
           | 
           | Would make it much easier to play around with the language
           | when trying to wrap my mind around the church version of
           | probmods.
        
         | fnord77 wrote:
         | does this require stats/probability knowledge?
         | 
         | on that topic, can anyone recommend an online stats/probability
         | course? I tried the coursera one by Sebastian Thrun and
         | couldn't get far into it because the "TA" examples were
         | unintelligible.
        
           | medo-bear wrote:
           | for machine learning i can recommend the course by Philipp
           | Henigg from Uni of Tuebingen
           | 
           | https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m.
           | ..
        
           | tbarringer816 wrote:
           | Stat110[0] (Harvard) on EdX! The professor, Joe Blitzstein,
           | is incredible. Easily one of the best classes I have ever
           | taken.
           | 
           | [0] https://www.edx.org/course/introduction-to-probability
        
       | kristjansson wrote:
       | NB updated as of two days ago, despite the 1809 title.
        
       ___________________________________________________________________
       (page generated 2021-10-22 23:00 UTC)