[HN Gopher] Pyro: A Universal, Probablistic Programming Language
       ___________________________________________________________________
        
       Pyro: A Universal, Probablistic Programming Language
        
       Author : optimalsolver
       Score  : 104 points
       Date   : 2023-07-28 07:10 UTC (15 hours ago)
        
 (HTM) web link (pyro.ai)
 (TXT) w3m dump (pyro.ai)
        
       | cube2222 wrote:
       | Another cool probabilistic programming language is Turing[0] in
       | Julia.
       | 
       | Had fun using it when working my way through the statistical
       | rethinking series.
       | 
       | [0]: https://turing.ml/
        
       | abeppu wrote:
       | > Pyro enables flexible and expressive deep probabilistic
       | modeling, unifying the best of modern deep learning and Bayesian
       | modeling.
       | 
       | Does anyone who works in this area have a sense of why PPLs
       | haven't "taken off" really? Like, of the last several years of ML
       | surprising successes, I can't really think of any major ones that
       | come from this line of work. To the extent that Bayesian
       | perspectives contribute to deep learning, I more often see e.g.
       | some particular take on ensembling around the same models trained
       | to find a point estimate via SGD, rather than models built up
       | from random variables about which we update beliefs including
       | representation of uncertainty.
        
         | mccoyb wrote:
         | I don't think the field has converged on "the right
         | abstractions" yet.
         | 
         | It's an active area of programming language research -- it
         | feels similar to where AD was at for awhile.
         | 
         | I work on this stuff for my research -- so I do believe that
         | there is a really good set of abstractions. my lab has had good
         | success at solving problems with these abstractions (which you
         | might not think are amenable or scale well with Bayesian
         | techniques, like pose or trajectory estimation and SLAM, with
         | renderers in a loop).
         | 
         | Other PPLs I've studied also have a mix of these abstractions,
         | but make other key design distinctions in interface / type
         | design that seem to cause issues when it comes to building
         | modular inference layers (or exposing performance optimization,
         | or extension).
         | 
         | I also often have the opinion that the design choices taken by
         | other PPLs feel overspecialized (optimized too early, for
         | specific inference patterns). I'm not blaming the creators! If
         | you setup to design abstractions, you often start with existing
         | problems.
         | 
         | On the other hand: if you're just solving similar problem
         | instances over and over again, in increasingly clever ways --
         | what's the point? Unless: (a) these problems are massive value
         | drivers for some sector (b) your increasingly clever ways are
         | driving down the cost, by reducing compute, or increasing
         | speed.
         | 
         | I think PPLs which overspecialize to existing problems are
         | useful, but have trouble inspiring new paradigms in AI (or e.g.
         | new hardware accelerator design, etc).
         | 
         | Partially this is because there's an upper bound on the
         | inference complexity which you can express with these systems
         | -- so it is hard to reach cases where people can ask: what X
         | application would this enable if we could run this inference
         | approximation 1000x faster?
         | 
         | (Also note that inference approximations _can_ include neural
         | networks)
        
           | krisoft wrote:
           | > pose or trajectory estimation and SLAM, with renderers in a
           | loop
           | 
           | That sounds very interesting! Is there something I could read
           | more about it? Perhaps publications by you or anything like
           | that you could recommend?
        
             | mccoyb wrote:
             | I would start with this submission by my colleague Nishad
             | Gothoskar to NeurIPS: https://proceedings.neurips.cc/paper/
             | 2021/hash/4fc66104f8ada... as an excellent starting point
             | to what good inference abstraction can enable
        
         | KRAKRISMOTT wrote:
         | They are used heavily in ML, how do you think VAEs work?
         | 
         | The white elephants are mostly the DSLs/frameworks that would
         | have better off been torch/tensorflow extensions.
        
           | palmy wrote:
           | Though I see what you're saying, using a PPL for VAEs just
           | seems like overkill given the simplistic nature of VAEs.
           | 
           | PPLs are useful when the data generation process is not
           | easily represented by something like a simple multivariate
           | Gaussian, etc. You find many good examples academic research,
           | e.g. epidemiology.
        
             | KRAKRISMOTT wrote:
             | Yes but mathematical integration (to solve Bayesian
             | equations) is difficult, the higher the dimension, the more
             | difficult it is. That's why differentiation is preferred.
             | The concepts behind PPL are firmly entrenched in
             | probabilistic ML, the ideas were never lost.
        
         | smeeth wrote:
         | Some might disagree with me but my best guesses are:
         | 
         | - Probability math is confusing and difficult, and a base
         | understanding is required to use PPLs in a way that is not true
         | of other ML/DL. Most CS PhDs will not be required to take
         | enough of it to find PPLs intuitive, so to be familiar they
         | will have had to opt into those classes. This is to say nothing
         | of BS/MS practitioners, so the user base is naturally limited
         | to the subset of people who studied Math/Stats is a rigorous
         | way AND opted into the right classes or taught themselves
         | later.
         | 
         | - Probabilistic models are often unique to the application.
         | They require lots of bespoke code, modeling, and understanding.
         | Contrast this with DL, where you throw your data in a blender
         | and receive outputs.
         | 
         | - Uncertainty quantification often is not the most important
         | outcome for sexy ML use cases. That is more frequently things
         | like "accuracy," "residual error," or "wow that picture looks
         | really good".
         | 
         | - PPL package tooling and documentation are often very
         | confusing and don't work similarly to one another. This isn't
         | necessarily the developer's fault, this stuff is hard, and the
         | people with the domain knowledge needed to actually understand
         | this stuff often have spent fewer hours in the open-source
         | trenches.
        
           | abeppu wrote:
           | Re your comment on CS PhDs not having probability background
           | -- do you find that's true of ML researchers? I would
           | understand that in a bunch of CS specialties, probability may
           | not be a requirement, but in ML I would have expected
           | otherwise.
        
             | uoaei wrote:
             | ML is fundamentally _not_ a CS specialty. It is a
             | statistics /optimization (thus applied math) specialty.
             | 
             | CS only comes into the picture at runtime. ML theory is
             | divorced from computability until then.
        
               | ke88y wrote:
               | That's a weird game to play with those words.
        
               | uoaei wrote:
               | Can you elaborate? The unreasonable effectiveness of
               | approximate methods on discretized spaces doesn't change
               | the fact that the theory underlying it is exact and
               | continuous.
        
             | gh02t wrote:
             | Not OP but I deal with this a lot. In my experience a lot
             | of folks working in mainstream ML haven't been exposed to
             | it unless they specifically focused on it. It might just be
             | a course load thing... getting the most out of these
             | probabilistic PLs requires fairly deep expertise in both
             | probability theory/Bayesian stats as well as in CS and you
             | have a finite amount of courses you can take in school.
             | Plus, a lot of the work in this area pre-dates the modern
             | focus on deep learning or machine learning in general, so a
             | lot of the knowledge tends to be held by
             | professors/researchers that may not be as involved with the
             | "new" ML courses. And of course, Math/Stats/CS departments
             | don't always play nicely with each other and like to fight
             | turf wars, though I've noticed cross-disciplinary research
             | among the three becoming more accepted at the
             | universities/institutes I work with.
             | 
             | As a case study, I did most of my grad work on solving
             | Bayesian inverse problems using probabilistic programming
             | for applications in engineering, which is pretty cross-
             | disciplinary. I now work mostly in ML, but I didn't really
             | even touch anything in the ML domain until after I finished
             | school. I could have, the courses were available, but they
             | just weren't relevant to me at the time.
             | 
             | Edit: I wouldn't be surprised if there _was_ a considerable
             | userbase in industries like finance, but in my experience
             | those folks don 't share much.
        
             | junipertea wrote:
             | While there are some exception, majority of published deep
             | learning research barely mentions statistics at all, it's
             | optimization all the way down
        
         | 6gvONxR4sf7o wrote:
         | They've really taken off in niche places. If you have a complex
         | model of something, it's dramatically easier to use of of these
         | to build/fit your model than it is to code by hand.
         | 
         | But those cases are still things were you might have just a
         | dozen variables (though each might be a long vector). It's more
         | the realm of statistical inference than it is general
         | programming or ML.
         | 
         | It hasn't "taken off" in ML because ML problems generally have
         | more specific solutions based on the problem. If you have
         | something simple and tabular, other solutions are generally
         | better. If you have something recsys shaped, other solutions
         | are generally better. If you have something vision/language
         | shaped, other solutions are generally better.
         | 
         | It hasn't "taken off" in general programming because PPLs
         | generally have trouble with control flow. Cutting off an entire
         | arm of a program is trivial in a traditional language, but in
         | PPLs you'll have to evaluate both. If the arm is a recursion
         | step and hitting the base case is probabilistic, you might even
         | have to evaluate arbitrarily deep (or you approximate that in a
         | way that significantly limits the breadth of techniques
         | available for running a program).
         | 
         | AFAICT, a truism in PPL is that there are always programs that
         | your language will run poorly on but a bespoke engine will do
         | better, by an extreme margin. There just aren't general
         | languages that perform as reliably as in deterministic
         | languages.
         | 
         | It's also just really really hard. It's roughly impossible to
         | make things that are easy in normal languages easy to work with
         | in PPLs. Consider these examples:
         | 
         | `def f(x, y): return x + y + noise` where you condition on
         | `f(3, y) == 5`. It's easy.
         | 
         | `def f(password, salt): return hash(password + salt)` where you
         | condition on `f(password, 8123746) == 1293487`. It's basically
         | not going to happen even though forward evaluation of f is
         | straightforward in any traditional language.
         | 
         | Hell, even just supporting `def f(x, y): return x+y` is hard to
         | generalize. Surprisingly it's harder to generalize than the
         | `x+y+noise` case.
        
           | mccoyb wrote:
           | I think you're overgeneralizing in your control flow
           | discussion.
           | 
           | I also don't understand your f example with (x, y, noise) if
           | you fix x and the return value, you still have two unknowns
           | with 1 equation. How is that easy to solve?
           | 
           | Unless you're considering using parametric inverses to
           | represent the solution -- but you didn't mention this so I
           | assume you didn't mean this.
        
             | 6gvONxR4sf7o wrote:
             | I was being underspecific to be concise, but in pyro, I'd
             | mean something like this:                   def f(x, y):
             | return pyro.sample(             "z",
             | dist.Normal(x+y, 1),             obs=5           )
             | model = pyro.condition(f, x=3)
        
         | esafak wrote:
         | These things go in and out of fashion. Now it's LLMs' turn to
         | have their fifteen minutes.
         | 
         | I think one reason why Bayesian models have not taken off is
         | that representing prediction uncertainty comes at the expense
         | of accuracy, for a given model size. People prefer to devote
         | model capacity to reducing the bias rather than modeling
         | uncertainty.
         | 
         | Bayesian models make more sense in the small-data regime, where
         | uncertainty looms large.
        
         | rustybolt wrote:
         | > Does anyone who works in this area have a sense of why PPLs
         | haven't "taken off" really?
         | 
         | Why should they take off? At least for me personally it's not
         | clear what the use case is, and this website answer exactly
         | none of my questions.
        
         | singhrac wrote:
         | I have spent a lot of time trying to use PPLs (including Pyro,
         | Edward, numpyro, etc.) in Real World data science use cases,
         | and many times mixing probabilistic programming (which in these
         | contexts means Bayesian inference on graphical models) and deep
         | networks (lots of parameters) doesn't work simply because you
         | don't have very strong priors. There are cases where these are
         | considered very effective (e.g. medicine, econometrics, etc.)
         | but I haven't worked in those areas.
         | 
         | NUTS-based approaches like Stan (and numpyro) have more usage,
         | and I think Prophet is a good example of a generalizable (if
         | limited) tool built on top of PPLs.
         | 
         | Pyro is a very impressive system, as is numpyro, which I think
         | is the successor since Uber AI disbanded (it's much faster).
        
         | rich_sasha wrote:
         | I'm no authority on the subject, but FWIW I tried quite a bit
         | to make various bayesian methods work for me. I never found
         | them to outperform equivalent frequentist (point estimate)
         | methods.
         | 
         | Modelling uncertainty sounds nice and sometimes is a goal in
         | itself, but often at the end of the day you need a point
         | estimate. And then IME all the priors, flexible models,
         | parameter distributions, just don't add anything. You could
         | imagine they do, with a more flexible model, but that is not my
         | experience.
         | 
         | But then, PPL is just so much harder. The initial premise is
         | nice - you write a program with some unknown parameters, you
         | have some inputs and outputs, and get some probabilistic
         | estimates out. But in practice it is way more complex. It can
         | easily and silently diverge (i.e. converge to a totally wrong
         | distribution), and even plain vanilla bayesian estimation is a
         | dark art.
        
           | thumbuddy wrote:
           | I've had them out perform frequentist methods, but there is a
           | real cost too it and some of it is closer to witchcraft then
           | science.
           | 
           | That said, I'll have to give this a spin sometime soon.
        
         | ke88y wrote:
         | I think largely for the same reason that numerical software
         | took off where symbolic solvers didn't.
         | 
         | Much more user friendly, "good enough", and actually scales to
         | problems of commercial interest.
        
         | nextos wrote:
         | It's much more expensive to train models. Besides, compilers
         | are not that smart yet. E.g. a HMM implemented in a PPL is
         | _far_ from the efficiency of hand-rolled code. For many use
         | cases, they are still a leaky abstraction.
         | 
         | However, in areas where measuring uncertainty is important,
         | they have taken off. Stan has become mainstream in Bayesian
         | statistics. Pyro and PyMC are also quite used in industry (I
         | have had recruiters contacting me for this skill). Infer.NET
         | has its own niche on discrete and online inference. Infer.NET
         | models ship with several Microsoft products.
         | 
         | Other interesting PPLs include Turing.jl, Gen.jl, and the
         | venerable BUGS.
        
           | rich_sasha wrote:
           | I'd be curious as to what the specific end applications are,
           | are there any you can share?
           | 
           | I'm familiar with the tooling and played with it quite a
           | bit... But never really figured out a practical application.
        
         | palmy wrote:
         | I work on one of these PPLs, and I personally find Bayesian
         | inference to be useful in a few cases:
         | 
         | 1. When your main objective is not prediction but understanding
         | the effect of some underlying / unobserved random variable.
         | 
         | 2. When you don't have tons data + you have very clear ideas of
         | the data generation process.
         | 
         | (1) is mainly relevant for science rather than private
         | companies, e.g. if you're an epidemiologist, you're generally
         | speaking interested in determining the effect of certain
         | underlying factors, e.g. effect of mobility patterns, rather
         | than just predicting the number of infected people tomorrow
         | since the hidden variables are often someting you can directly
         | control, e.g. impose travel restrictions.
         | 
         | (2) can occur either in academic settings or in private sector
         | in applications such as revenue optimization. In these
         | scenarios, it's also very useful to have a notion of the "risk"
         | you're taking by optimizing according to this model. Such a
         | notion of risk is in the Bayesian framework completely
         | straight-forward, while less so in the frequentist scenarios.
         | 
         | I've been involved in the above scenarios and have seen clear
         | advantages of using Bayesian inference, both in academia and
         | private sector.
         | 
         | With that being said, I don't think ever Bayesian inference,
         | and thus even less so PPLs, are going to "take off" in a
         | similar fashion to many other machine learning techniques. The
         | reason for this are fairly simple:
         | 
         | 1. It's difficult. Applying these techniques efficiently and
         | correctly is way more difficult than standard frequentist
         | methods (even interpeting the results is often non-trivial).
         | 
         | 2. The applicability of Bayesian inference (and thus PPLs) is
         | just so much more limited due to the computational complexity +
         | reduction in utility of the methods as data increases (which,
         | for private companies, is more and more the case).
         | 
         | PPLs mainly try to address (1), and we do have examples of very
         | successful examples of this, e.g. PyMC3 (they also have a bunch
         | of nice examples of applying Bayesian inference in private
         | sector context), and Stan (maybe more heavily used in
         | academia).
        
       | randrus wrote:
       | There's a name collision with "Python Remote Objects". Which I
       | have to see as unfortunate, given my scars from that other pyro.
        
         | gjvc wrote:
         | I flirted with that but never used it. What was it like?
        
           | randrus wrote:
           | It's a remote object model, very similar in spirit to CORBA.
           | This allows the object creator/user and the object itself to
           | be in different fault domains - which makes it all too easy
           | to lose track of objects and leak them, unless you've added
           | significant management scaffolding.
        
       | jmugan wrote:
       | I wish Pyro would do a better job of hiding the implementation
       | details. I shouldn't need to understand variational inference and
       | such just to get the probability of a god dang hot dog. I've
       | tried to use Pyro a few times, but every time I spend more effort
       | trying to understand poutines and such instead of modeling my
       | problem.
        
         | nerdponx wrote:
         | FWIW Stan can work like this at least in simpler models,
         | especially if you use one of its R wrapper packages.
        
         | jmugan wrote:
         | And I wish they would merge it with the beautiful explanations
         | at https://probmods.org/. We need a practical probabilistic
         | programming language in Python. We have PyMC, but to use that
         | you have to pull out your old notes on Theano.
        
           | nerdponx wrote:
           | PyStan? Numpyro?
        
             | theptip wrote:
             | Interested in folks' thoughts on how BeanMachine compares
             | too.
        
             | jmugan wrote:
             | Those didn't pop up last time I searched in this area. I
             | knew about Stan but not PyStan. I'll check those out.
        
       ___________________________________________________________________
       (page generated 2023-07-28 23:00 UTC)