[HN Gopher] Pyro: A Universal, Probablistic Programming Language
___________________________________________________________________
Pyro: A Universal, Probablistic Programming Language
Author : optimalsolver
Score : 104 points
Date : 2023-07-28 07:10 UTC (15 hours ago)
(HTM) web link (pyro.ai)
(TXT) w3m dump (pyro.ai)
| cube2222 wrote:
| Another cool probabilistic programming language is Turing[0] in
| Julia.
|
| Had fun using it when working my way through the statistical
| rethinking series.
|
| [0]: https://turing.ml/
| abeppu wrote:
| > Pyro enables flexible and expressive deep probabilistic
| modeling, unifying the best of modern deep learning and Bayesian
| modeling.
|
| Does anyone who works in this area have a sense of why PPLs
| haven't "taken off" really? Like, of the last several years of ML
| surprising successes, I can't really think of any major ones that
| come from this line of work. To the extent that Bayesian
| perspectives contribute to deep learning, I more often see e.g.
| some particular take on ensembling around the same models trained
| to find a point estimate via SGD, rather than models built up
| from random variables about which we update beliefs including
| representation of uncertainty.
| mccoyb wrote:
| I don't think the field has converged on "the right
| abstractions" yet.
|
| It's an active area of programming language research -- it
| feels similar to where AD was at for awhile.
|
| I work on this stuff for my research -- so I do believe that
| there is a really good set of abstractions. my lab has had good
| success at solving problems with these abstractions (which you
| might not think are amenable or scale well with Bayesian
| techniques, like pose or trajectory estimation and SLAM, with
| renderers in a loop).
|
| Other PPLs I've studied also have a mix of these abstractions,
| but make other key design distinctions in interface / type
| design that seem to cause issues when it comes to building
| modular inference layers (or exposing performance optimization,
| or extension).
|
| I also often have the opinion that the design choices taken by
| other PPLs feel overspecialized (optimized too early, for
| specific inference patterns). I'm not blaming the creators! If
| you setup to design abstractions, you often start with existing
| problems.
|
| On the other hand: if you're just solving similar problem
| instances over and over again, in increasingly clever ways --
| what's the point? Unless: (a) these problems are massive value
| drivers for some sector (b) your increasingly clever ways are
| driving down the cost, by reducing compute, or increasing
| speed.
|
| I think PPLs which overspecialize to existing problems are
| useful, but have trouble inspiring new paradigms in AI (or e.g.
| new hardware accelerator design, etc).
|
| Partially this is because there's an upper bound on the
| inference complexity which you can express with these systems
| -- so it is hard to reach cases where people can ask: what X
| application would this enable if we could run this inference
| approximation 1000x faster?
|
| (Also note that inference approximations _can_ include neural
| networks)
| krisoft wrote:
| > pose or trajectory estimation and SLAM, with renderers in a
| loop
|
| That sounds very interesting! Is there something I could read
| more about it? Perhaps publications by you or anything like
| that you could recommend?
| mccoyb wrote:
| I would start with this submission by my colleague Nishad
| Gothoskar to NeurIPS: https://proceedings.neurips.cc/paper/
| 2021/hash/4fc66104f8ada... as an excellent starting point
| to what good inference abstraction can enable
| KRAKRISMOTT wrote:
| They are used heavily in ML, how do you think VAEs work?
|
| The white elephants are mostly the DSLs/frameworks that would
| have better off been torch/tensorflow extensions.
| palmy wrote:
| Though I see what you're saying, using a PPL for VAEs just
| seems like overkill given the simplistic nature of VAEs.
|
| PPLs are useful when the data generation process is not
| easily represented by something like a simple multivariate
| Gaussian, etc. You find many good examples academic research,
| e.g. epidemiology.
| KRAKRISMOTT wrote:
| Yes but mathematical integration (to solve Bayesian
| equations) is difficult, the higher the dimension, the more
| difficult it is. That's why differentiation is preferred.
| The concepts behind PPL are firmly entrenched in
| probabilistic ML, the ideas were never lost.
| smeeth wrote:
| Some might disagree with me but my best guesses are:
|
| - Probability math is confusing and difficult, and a base
| understanding is required to use PPLs in a way that is not true
| of other ML/DL. Most CS PhDs will not be required to take
| enough of it to find PPLs intuitive, so to be familiar they
| will have had to opt into those classes. This is to say nothing
| of BS/MS practitioners, so the user base is naturally limited
| to the subset of people who studied Math/Stats is a rigorous
| way AND opted into the right classes or taught themselves
| later.
|
| - Probabilistic models are often unique to the application.
| They require lots of bespoke code, modeling, and understanding.
| Contrast this with DL, where you throw your data in a blender
| and receive outputs.
|
| - Uncertainty quantification often is not the most important
| outcome for sexy ML use cases. That is more frequently things
| like "accuracy," "residual error," or "wow that picture looks
| really good".
|
| - PPL package tooling and documentation are often very
| confusing and don't work similarly to one another. This isn't
| necessarily the developer's fault, this stuff is hard, and the
| people with the domain knowledge needed to actually understand
| this stuff often have spent fewer hours in the open-source
| trenches.
| abeppu wrote:
| Re your comment on CS PhDs not having probability background
| -- do you find that's true of ML researchers? I would
| understand that in a bunch of CS specialties, probability may
| not be a requirement, but in ML I would have expected
| otherwise.
| uoaei wrote:
| ML is fundamentally _not_ a CS specialty. It is a
| statistics /optimization (thus applied math) specialty.
|
| CS only comes into the picture at runtime. ML theory is
| divorced from computability until then.
| ke88y wrote:
| That's a weird game to play with those words.
| uoaei wrote:
| Can you elaborate? The unreasonable effectiveness of
| approximate methods on discretized spaces doesn't change
| the fact that the theory underlying it is exact and
| continuous.
| gh02t wrote:
| Not OP but I deal with this a lot. In my experience a lot
| of folks working in mainstream ML haven't been exposed to
| it unless they specifically focused on it. It might just be
| a course load thing... getting the most out of these
| probabilistic PLs requires fairly deep expertise in both
| probability theory/Bayesian stats as well as in CS and you
| have a finite amount of courses you can take in school.
| Plus, a lot of the work in this area pre-dates the modern
| focus on deep learning or machine learning in general, so a
| lot of the knowledge tends to be held by
| professors/researchers that may not be as involved with the
| "new" ML courses. And of course, Math/Stats/CS departments
| don't always play nicely with each other and like to fight
| turf wars, though I've noticed cross-disciplinary research
| among the three becoming more accepted at the
| universities/institutes I work with.
|
| As a case study, I did most of my grad work on solving
| Bayesian inverse problems using probabilistic programming
| for applications in engineering, which is pretty cross-
| disciplinary. I now work mostly in ML, but I didn't really
| even touch anything in the ML domain until after I finished
| school. I could have, the courses were available, but they
| just weren't relevant to me at the time.
|
| Edit: I wouldn't be surprised if there _was_ a considerable
| userbase in industries like finance, but in my experience
| those folks don 't share much.
| junipertea wrote:
| While there are some exception, majority of published deep
| learning research barely mentions statistics at all, it's
| optimization all the way down
| 6gvONxR4sf7o wrote:
| They've really taken off in niche places. If you have a complex
| model of something, it's dramatically easier to use of of these
| to build/fit your model than it is to code by hand.
|
| But those cases are still things were you might have just a
| dozen variables (though each might be a long vector). It's more
| the realm of statistical inference than it is general
| programming or ML.
|
| It hasn't "taken off" in ML because ML problems generally have
| more specific solutions based on the problem. If you have
| something simple and tabular, other solutions are generally
| better. If you have something recsys shaped, other solutions
| are generally better. If you have something vision/language
| shaped, other solutions are generally better.
|
| It hasn't "taken off" in general programming because PPLs
| generally have trouble with control flow. Cutting off an entire
| arm of a program is trivial in a traditional language, but in
| PPLs you'll have to evaluate both. If the arm is a recursion
| step and hitting the base case is probabilistic, you might even
| have to evaluate arbitrarily deep (or you approximate that in a
| way that significantly limits the breadth of techniques
| available for running a program).
|
| AFAICT, a truism in PPL is that there are always programs that
| your language will run poorly on but a bespoke engine will do
| better, by an extreme margin. There just aren't general
| languages that perform as reliably as in deterministic
| languages.
|
| It's also just really really hard. It's roughly impossible to
| make things that are easy in normal languages easy to work with
| in PPLs. Consider these examples:
|
| `def f(x, y): return x + y + noise` where you condition on
| `f(3, y) == 5`. It's easy.
|
| `def f(password, salt): return hash(password + salt)` where you
| condition on `f(password, 8123746) == 1293487`. It's basically
| not going to happen even though forward evaluation of f is
| straightforward in any traditional language.
|
| Hell, even just supporting `def f(x, y): return x+y` is hard to
| generalize. Surprisingly it's harder to generalize than the
| `x+y+noise` case.
| mccoyb wrote:
| I think you're overgeneralizing in your control flow
| discussion.
|
| I also don't understand your f example with (x, y, noise) if
| you fix x and the return value, you still have two unknowns
| with 1 equation. How is that easy to solve?
|
| Unless you're considering using parametric inverses to
| represent the solution -- but you didn't mention this so I
| assume you didn't mean this.
| 6gvONxR4sf7o wrote:
| I was being underspecific to be concise, but in pyro, I'd
| mean something like this: def f(x, y):
| return pyro.sample( "z",
| dist.Normal(x+y, 1), obs=5 )
| model = pyro.condition(f, x=3)
| esafak wrote:
| These things go in and out of fashion. Now it's LLMs' turn to
| have their fifteen minutes.
|
| I think one reason why Bayesian models have not taken off is
| that representing prediction uncertainty comes at the expense
| of accuracy, for a given model size. People prefer to devote
| model capacity to reducing the bias rather than modeling
| uncertainty.
|
| Bayesian models make more sense in the small-data regime, where
| uncertainty looms large.
| rustybolt wrote:
| > Does anyone who works in this area have a sense of why PPLs
| haven't "taken off" really?
|
| Why should they take off? At least for me personally it's not
| clear what the use case is, and this website answer exactly
| none of my questions.
| singhrac wrote:
| I have spent a lot of time trying to use PPLs (including Pyro,
| Edward, numpyro, etc.) in Real World data science use cases,
| and many times mixing probabilistic programming (which in these
| contexts means Bayesian inference on graphical models) and deep
| networks (lots of parameters) doesn't work simply because you
| don't have very strong priors. There are cases where these are
| considered very effective (e.g. medicine, econometrics, etc.)
| but I haven't worked in those areas.
|
| NUTS-based approaches like Stan (and numpyro) have more usage,
| and I think Prophet is a good example of a generalizable (if
| limited) tool built on top of PPLs.
|
| Pyro is a very impressive system, as is numpyro, which I think
| is the successor since Uber AI disbanded (it's much faster).
| rich_sasha wrote:
| I'm no authority on the subject, but FWIW I tried quite a bit
| to make various bayesian methods work for me. I never found
| them to outperform equivalent frequentist (point estimate)
| methods.
|
| Modelling uncertainty sounds nice and sometimes is a goal in
| itself, but often at the end of the day you need a point
| estimate. And then IME all the priors, flexible models,
| parameter distributions, just don't add anything. You could
| imagine they do, with a more flexible model, but that is not my
| experience.
|
| But then, PPL is just so much harder. The initial premise is
| nice - you write a program with some unknown parameters, you
| have some inputs and outputs, and get some probabilistic
| estimates out. But in practice it is way more complex. It can
| easily and silently diverge (i.e. converge to a totally wrong
| distribution), and even plain vanilla bayesian estimation is a
| dark art.
| thumbuddy wrote:
| I've had them out perform frequentist methods, but there is a
| real cost too it and some of it is closer to witchcraft then
| science.
|
| That said, I'll have to give this a spin sometime soon.
| ke88y wrote:
| I think largely for the same reason that numerical software
| took off where symbolic solvers didn't.
|
| Much more user friendly, "good enough", and actually scales to
| problems of commercial interest.
| nextos wrote:
| It's much more expensive to train models. Besides, compilers
| are not that smart yet. E.g. a HMM implemented in a PPL is
| _far_ from the efficiency of hand-rolled code. For many use
| cases, they are still a leaky abstraction.
|
| However, in areas where measuring uncertainty is important,
| they have taken off. Stan has become mainstream in Bayesian
| statistics. Pyro and PyMC are also quite used in industry (I
| have had recruiters contacting me for this skill). Infer.NET
| has its own niche on discrete and online inference. Infer.NET
| models ship with several Microsoft products.
|
| Other interesting PPLs include Turing.jl, Gen.jl, and the
| venerable BUGS.
| rich_sasha wrote:
| I'd be curious as to what the specific end applications are,
| are there any you can share?
|
| I'm familiar with the tooling and played with it quite a
| bit... But never really figured out a practical application.
| palmy wrote:
| I work on one of these PPLs, and I personally find Bayesian
| inference to be useful in a few cases:
|
| 1. When your main objective is not prediction but understanding
| the effect of some underlying / unobserved random variable.
|
| 2. When you don't have tons data + you have very clear ideas of
| the data generation process.
|
| (1) is mainly relevant for science rather than private
| companies, e.g. if you're an epidemiologist, you're generally
| speaking interested in determining the effect of certain
| underlying factors, e.g. effect of mobility patterns, rather
| than just predicting the number of infected people tomorrow
| since the hidden variables are often someting you can directly
| control, e.g. impose travel restrictions.
|
| (2) can occur either in academic settings or in private sector
| in applications such as revenue optimization. In these
| scenarios, it's also very useful to have a notion of the "risk"
| you're taking by optimizing according to this model. Such a
| notion of risk is in the Bayesian framework completely
| straight-forward, while less so in the frequentist scenarios.
|
| I've been involved in the above scenarios and have seen clear
| advantages of using Bayesian inference, both in academia and
| private sector.
|
| With that being said, I don't think ever Bayesian inference,
| and thus even less so PPLs, are going to "take off" in a
| similar fashion to many other machine learning techniques. The
| reason for this are fairly simple:
|
| 1. It's difficult. Applying these techniques efficiently and
| correctly is way more difficult than standard frequentist
| methods (even interpeting the results is often non-trivial).
|
| 2. The applicability of Bayesian inference (and thus PPLs) is
| just so much more limited due to the computational complexity +
| reduction in utility of the methods as data increases (which,
| for private companies, is more and more the case).
|
| PPLs mainly try to address (1), and we do have examples of very
| successful examples of this, e.g. PyMC3 (they also have a bunch
| of nice examples of applying Bayesian inference in private
| sector context), and Stan (maybe more heavily used in
| academia).
| randrus wrote:
| There's a name collision with "Python Remote Objects". Which I
| have to see as unfortunate, given my scars from that other pyro.
| gjvc wrote:
| I flirted with that but never used it. What was it like?
| randrus wrote:
| It's a remote object model, very similar in spirit to CORBA.
| This allows the object creator/user and the object itself to
| be in different fault domains - which makes it all too easy
| to lose track of objects and leak them, unless you've added
| significant management scaffolding.
| jmugan wrote:
| I wish Pyro would do a better job of hiding the implementation
| details. I shouldn't need to understand variational inference and
| such just to get the probability of a god dang hot dog. I've
| tried to use Pyro a few times, but every time I spend more effort
| trying to understand poutines and such instead of modeling my
| problem.
| nerdponx wrote:
| FWIW Stan can work like this at least in simpler models,
| especially if you use one of its R wrapper packages.
| jmugan wrote:
| And I wish they would merge it with the beautiful explanations
| at https://probmods.org/. We need a practical probabilistic
| programming language in Python. We have PyMC, but to use that
| you have to pull out your old notes on Theano.
| nerdponx wrote:
| PyStan? Numpyro?
| theptip wrote:
| Interested in folks' thoughts on how BeanMachine compares
| too.
| jmugan wrote:
| Those didn't pop up last time I searched in this area. I
| knew about Stan but not PyStan. I'll check those out.
___________________________________________________________________
(page generated 2023-07-28 23:00 UTC)