[HN Gopher] Modeling Uncertainty with PyTorch
___________________________________________________________________
Modeling Uncertainty with PyTorch
Author : srom
Score : 44 points
Date : 2022-01-07 14:58 UTC (1 days ago)
(HTM) web link (romainstrock.com)
(TXT) w3m dump (romainstrock.com)
| gillesjacobs wrote:
| The field of ML is largely focused on just getting predictions
| with fancy models. Estimating the uncertainty, unexpectedness and
| perplexity of specific predictions is highly underappreciated in
| common practice.
|
| Even though it is highly economically valuable to be able to tell
| to what extent you can trust a prediction, the modelling of
| uncertainty of ML pipelines remains an academic affair in my
| experience.
| NeedMoreTime4Me wrote:
| You are definitely right; there are numerous classic
| applications (i.e. outside of the cutting-edge CV/NLP stuff)
| that could greatly benefit from such a measure.
|
| The question is: Why don't people use these models? While
| Bayesian Neural Networks might be tricky to deploy & debug for
| some people, Gaussian Processes etc. are readily available in
| sklearn and other implementations.
|
| My theory: most people do not learn these methods in their
| ,,Introduction to Machine Learning" classes. Or is it lacking
| scalability in practice?
| shakow wrote:
| > Or is it lacking scalability in practice?
|
| Only speaking from my own little perspective in
| bioinformatics, lack of scalability above all else, both for
| BNNs and GPs.
|
| Sure, the library support could be better, but that was not
| the main hurdle, more of a friction.
| NeedMoreTime4Me wrote:
| Do you have an anecdotal guess on the scalability barrier
| maybe? Like does it take too long with more than 10,000
| data points having 100 features? Just to get a feel.
| shakow wrote:
| Please don't quote me on that, as it was academic work in
| a given language and a given library and might not be
| representative of the whole ecosystem.
|
| But in a nutshell, on OK-ish CPUs (Xeons a few
| generations old), we started seeing problems past a few
| thousands points with a few dozens features.
|
| And not only was the training slow, but also the
| inference: as we used the whole sampled chain of the
| weights distributions parameters, not only was memory
| consumption a sight to behold, but inference time quickly
| grew through the roof when subsampling was not used.
|
| And all that was on standard NNs, so no complexity added
| by e.g. convolution layers.
| disgruntledphd2 wrote:
| It takes more compute, and the errors from badly chosen data
| vastly outweigh the uncertainties associated with your
| parameter estimate.
|
| To be fair, I suspect lots of people do this, but for
| whatever reason nobody talks about it.
| b3kart wrote:
| They often don't scale, they are tricky to implement in
| frameworks that people are familiar with, but, most
| importantly, they make crude approximations meaning after all
| this effort they often don't beat simple baselines like
| bootstrap. It's an exciting area of research though.
| marbletimes wrote:
| When I was in academia, I used to fit highly sophisticated
| models (think many-parameters, multi-level non-linear mixed
| effect models) who were giving not only point estimate but also
| confidence and predictive intervals ("please explain to me the
| difference between the two" is one of my favorite interview
| questions and I still have not heard a correct answer).
|
| When I tried to bring an "uncertainty mindset" over when I
| moved to industry, I found that (1) most DS/ML scientists use
| ML models that typically don't provide an easy way to estimate
| uncertainty intervals, (2) in the industry I was in (media)
| people who make decisions and use model prediction as one of
| the input for their decision-making are typically not very
| quantitative and an uncertainty interval, rather than give
| strength to their process, would confuse them more than
| anything else: they want a "more or less" estimate, more than a
| "more or less plus something more and something less" estimate.
| (3) When services are customer-facing (see ride-sharing)
| providing an uncertainty interval (your car will arrive between
| 9 and 15 minutes) would anchor the customer to the lower
| estimate (they do for the price of rides book in advance, and
| they need to do it, but they are often way off).
|
| So for many ML applications, an uncertainty interval that
| nobody internally or externally would base their decision upon
| is just a nuisance.
| curiousgal wrote:
| > the difference between the two
|
| One is bigger than the other as far as I remember which means
| that the standard error of the prediction interval is bigger?
| marbletimes wrote:
| From a good SO answer, see https://stats.stackexchange.com/
| questions/16493/difference-b...
|
| "A confidence interval gives a range for E[y|x], as you
| say. A prediction interval gives a range for y itself.".
|
| In the vast majority of the cases, what we want it the
| range for y (prediction interval), that is, given x = 3,
| what is the expected distribution of y?. For example, say
| we train a model to estimate how the 100-m dash time varies
| with age. The uncertainty we want is, "at age 48, 90% of
| Master Athletes run the 100-m dash between 10.2 and 12.4
| seconds" (here there would be another difference to point
| out between Frequentist and Bayesian intervals, but let's
| make things simple).
|
| We are generally not interested in, given x = 3, what is
| the uncertainty of the expected value of y (that is, the
| confidence interval)? In this case, the uncertainty we get
| (we might want it, but often we do not), is, "at age 48, we
| are 90% confident that the expected time to complete the
| 100-m dash for Master Athletes is between 11.2 and 11.6
| seconds".
| code_biologist wrote:
| Great answer. It prompts a bunch of followup questions!
|
| _most DS /ML scientists use ML models that typically don't
| provide an easy way to estimate uncertainty intervals_
|
| Not an DS/ML scientist but a data engineer. The models I've
| used have been pretty much "slap it into XGBoost with k-fold
| CV, call it done" -- an easy black box. Is there any model or
| approach you like to estimate uncertainty with similar ease?
|
| I've seen uncertainty interval / quantile regression done
| using XGBoost, but it isn't out of the box. I've also been
| trying to learn some Bayesian modeling, but definitely don't
| feel handy enough to apply it to random problems needing
| quick answers at work.
| marbletimes wrote:
| Correct, quantile regression is an option. Another is
| "pure" bootstrapping (you can see by googling something
| like uncertainty + machine learning + bootstrapping that
| this is a very active area of current research).
|
| The major problem with bootstrapping is the computational
| time for big models, since many models need to be fit to
| obtain a representative distribution of predictions.
|
| Now, if you want more "rigorous" quantification of
| uncertainty, one option is to go Bayesian using
| probabilistic programming (PyMC, Stan, TMB), but
| computational time for large models can be prohibitive.
| Another option is to "scale down" the complexity to models
| that might be (on average) a bit less accurate, but provide
| rigorous uncertainty intervals and good interpretability of
| results, for example Generalized Additive Models.
|
| A note here is that I saw certain quantification of
| uncertainty by people who were considered very capable in
| the ML community that gave me goosebumps, for example since
| the lower bound of the interval was a negative number and
| the response variable modeled could not be negative, the
| uncertainty interval was "cut" at zero (one easy way to
| deal with it, although it depends on the variable modeled
| and the model itself, is log-transforming the response--but
| pay attention to intervals when exp(log(y)) to get back to
| the natural scale. Another useful interview question.)
| joconde wrote:
| What do "multi-level" and "mixed effects" mean? There are
| tons of non-linear models with lots of parameters, but I've
| never heard these other terms.
| canjobear wrote:
| https://en.wikipedia.org/wiki/Nonlinear_mixed-effects_model
| math_dandy wrote:
| Uncertainty estimates in traditional parametric statistics are
| facilitated by strong assumptions on the distribution of the
| data being analyzed.
|
| In traditional nonparametric statistics, uncertainty estimates
| are obtained by a process called bootstrapping. But there's a
| trade-off. There's no free lunch!) If you want to eschew strong
| distributional hypotheses, you need to pay for it with more
| data and more compute. The "more compute" typically involves
| fitting variants of the model in question to many subsets of
| the original dataset. In deep learning applications in which
| each fit of the model is extremely expensive, this is
| impractical.
___________________________________________________________________
(page generated 2022-01-08 23:01 UTC)