https://stats.stackexchange.com/questions/639548/why-is-everything-based-on-likelihoods-even-though-likelihoods-are-so-small Stack Exchange Network Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange [ ] Loading... 1. + Tour Start here for a quick overview of the site + Help Center Detailed answers to any questions you might have + Meta Discuss the workings and policies of this site + About Us Learn more about Stack Overflow the company, and our products 2. 3. current community + Cross Validated help chat + Cross Validated Meta your communities Sign up or log in to customize your list. more stack exchange communities company blog 4. 5. Log in 6. Sign up Cross Validated 1. 1. Home 2. Questions 3. Tags 4. 5. Users 6. Unanswered 2. Teams Stack Overflow for Teams - Start collaborating and sharing organizational knowledge. [teams-illo-free-si] Create a free Team Why Teams? 3. Teams 4. Create free Team Teams Q&A for work Connect and share knowledge within a single location that is structured and easy to search. Learn more about Teams Why is everything based on likelihoods even though likelihoods are so small? Ask Question Asked today Modified today Viewed 23k times 15 $\begingroup$ Suppose I generate some random numbers from a specific normal distribution in R: set.seed(123) random_numbers <- rnorm(50, mean = 5, sd = 5) These numbers look like this: [1] 2.1976218 3.8491126 12.7935416 5.3525420 5.6464387 13.5753249 7.3045810 -1.3253062 [9] 1.5657357 2.7716901 11.1204090 6.7990691 7.0038573 5.5534136 2.2207943 13.9345657 [17] 7.4892524 -4.8330858 8.5067795 2.6360430 -0.3391185 3.9101254 -0.1300222 1.3555439 [25] 1.8748037 -3.4334666 9.1889352 5.7668656 -0.6906847 11.2690746 7.1323211 3.5246426 [33] 9.4756283 9.3906674 9.1079054 8.4432013 7.7695883 4.6904414 3.4701867 3.0976450 [41] 1.5264651 3.9604136 -1.3269818 15.8447798 11.0398100 -0.6155429 2.9855758 2.6667232 [49] 8.8998256 4.5831547 Now, suppose I calculate the likelihood of these numbers under the correct normal distribution:: likelihood <- prod(dnorm(random_numbers, mean = 5, sd = 5)) [1] 9.183016e-65 As we can see, even from the correct distribution, the likelihood is very, very small. Thus, it appears to be very unlikely in a certain sense that these numbers came from the very distribution they were generated from. The only consolation is that the likelihood is even smaller when coming from some other distribution, e.g. > likelihood <- prod(dnorm(random_numbers, mean = 6, sd = 6)) > likelihood [1] 3.954015e-66 But this to me seems like a moot point: a turtle is faster than a snail, but both animals are slow. Even though the correct likelihood (i.e. 5,5) is bigger than the incorrect likelihood (i.e. 6,6), both are still so small! So how come in statistics, everything is based on likelihoods (e.g. regression estimates, maximum likelihood estimation, etc) when the evaluated likelihood is always so small for even the correct distribution? * maximum-likelihood * likelihood Share Cite Improve this question Follow edited 8 hours ago kjetil b halvorsen's user avatar kjetil b halvorsen 76.5k3131 gold badges186186 silver badges577577 bronze badges asked 20 hours ago ionojoseph's user avatar ionojosephionojoseph 15111 gold badge11 silver badge33 bronze badges New contributor ionojoseph is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. $\endgroup$ 8 * 1 $\begingroup$ Welcome to Cross Validated! Would it help if we normalized the area under a PDF to a googol to inflate these numbers? $\endgroup$ - Dave 20 hours ago * $\begingroup$ isnt that for integration? $\endgroup$ - ionojoseph 20 hours ago * $\begingroup$ Yes, but think about how high the PDF y-values (the likelihood values) would be for the area under the PDF to be a googol. $\endgroup$ - Dave 20 hours ago * $\begingroup$ I am a bit confused. integrating a distribution tells the probability of observing a range of values. likelihood is for individual points .... because the probability of observing an individual point is 0 as I understand? $\endgroup$ - ionojoseph 20 hours ago * 3 $\begingroup$ The probability of observing an exact value from a truly continuous distribution might be zero, but your values are nowhere near exact, as they are expressed to 8 significant figures. The probability of observing a value that rounds to, or is observed to 8 significant figures is much higher than zero. $\ endgroup$ - Michael Lew 19 hours ago | Show 3 more comments 3 Answers 3 Sorted by: Reset to default [Highest score (default) ] 9 $\begingroup$ The key lies not in the absolute size of the likelihood values but in their relative comparison and the mathematical principles underlying likelihood-based methods. The smallness of the likelihood is expected when dealing with continuous distributions and a product of many probabilities because you're essentially multiplying a lot of numbers that are less than 1. The utility of likelihoods comes from their comparative nature, not their absolute values. When we compare likelihoods across different sets of parameters, we're looking for which parameters make the observed data "most likely" relative to other parameter sets, rather than looking for a likelihood that suggests the data is likely in an absolute sense. The scale of likelihood values is often less important than how these values change relative to changes in parameters. This is why in many statistical methods, such as MLE, we're interested in finding the parameters that maximize the likelihood function, as these are considered the best estimates given the data. Because likelihood values can be extremely small, in practice, statisticians often work with the log of the likelihood. This transformation turns products into sums, making the values more manageable and the optimization problems easier to solve, while preserving the location of the maximum. set.seed(123) random_numbers <- rnorm(50, mean = 5, sd = 5) # Function to calculate log likelihood of a normal distribution log_likelihood <- function(data, mean, sd) { sum(dnorm(data, mean, sd, log = TRUE)) } # Calculating log likelihood for the correct parameters log_likelihood_correct <- log_likelihood(random_numbers, 5, 5) print(log_likelihood_correct) [1] -147.4507 # Calculating log likelihood for incorrect parameters log_likelihood_incorrect <- log_likelihood(random_numbers, 6, 6) print(log_likelihood_incorrect) [1] -150.5959 # Comparison print(log_likelihood_correct > log_likelihood_incorrect) [1] TRUE Share Cite Improve this answer Follow edited 8 hours ago kjetil b halvorsen's user avatar kjetil b halvorsen 76.5k3131 gold badges186186 silver badges577577 bronze badges answered 19 hours ago ADAM's user avatar ADAMADAM 27611 silver badge99 bronze badges $\endgroup$ Add a comment | 5 $\begingroup$ First, as others have mentioned, we usually work with the logarithm of the likelihood function, for various mathematical and computational reasons. Second, since the likelihood function depends on the data, it is convenient to transform it to a function with standardized maxima (see Pickles 1986). $$ R(\theta) = \frac{L(\theta)}{L(\theta^\ast)} \ quad \text{where } \theta^\ast = \arg \max_{\theta} L(\theta) $$ set.seed(123) random_numbers <- rnorm(50, mean = 5, sd = 5) max_likelihood <- prod(dnorm(random_numbers, mean = 5, sd = 5)) nonmax_likelihood <- rep(0,1000) j <- 1 for (k in seq(0,10,length.out=1000)) { nonmax_likelihood[j] <- prod(dnorm(random_numbers, mean=k, sd=5)) j <- j+1 } par(mfrow = c(1, 2)) plot(seq(0,10,length.out=1000),nonmax_likelihood/max_likelihood, xlab="Mean", ylab="Relative likelihood") plot(seq(0,10,length.out=1000),log(nonmax_likelihood) - log(max_likelihood), xlab="Mean", ylab="Relative log-likelihood") enter image description here Share Cite Improve this answer Follow answered 17 hours ago Durden's user avatar DurdenDurden 1,2131717 silver badges2626 bronze badges $\endgroup$ 3 * $\begingroup$ I would say that the mathematical convenience of using log likelihood functions are more than counterbalanced by the un-intuitiveness introduced by the log scale. In the figures you supplied the support by the data of means near 5 is much more easily seen in the linear likelihood graph. $\endgroup$ - Michael Lew 17 hours ago * 1 $\begingroup$ I would also add that the convenience of scaling the likelihood function to have unit maximum is possible because the likelihoods are only used as ratios. It is also worth noting that you have only dealt with the mean parameter, whereas the question included variation of both the mean and spread parameters. (I only mention this because the OP seems to be new to likelihoods.) $\endgroup$ - Michael Lew 17 hours ago * $\begingroup$ You do not typically work with the logarithm of the likelihood function when multplying the prior by the likelihood function and then normailizing, to get the posterior distribution. $\endgroup$ - Michael Hardy 18 mins ago Add a comment | 3 $\begingroup$ I can think of two things that might help you. First, likelihoods are defined only to a proportionality factor and their utility comes from their use in a ratio and while they are proportional to the relevant probability, they are not probabilities. That means that if you are uncomfortable with the values in the range of $10^{-65}$ then you could simply multiply them all by $10^{65}$ without changing the ratios. Of course, there is no need to do as the ratio effectively does it for you. The likelihood ratio for the two distributions is about 25 times in favour of the 5,5 distribution over the 6,6 distribution. That would typically be thought of as being fairly strong (but not overwhelmingly strong) support by the data (and the statistical model) for the 5,5 distribution over the 6,6 distribution. Second, I usually find a plot of the likelihood as a function of a parameter to be helpful. You have set up the system with two parameters that are effectively 'of interest' and so the relevant likelihood function would be three dimensional and thus awkward. (Those dimensions being the population mean, the standard deviation, and the likelihood values.) It would be easier for you to fix one of those parameters and explore the likelihoods as a function of the other. My justification for looking at the full likelihood function rather than a singular ratio of two selected points in parameter space is that it contains more information and it allows the data to speak with less distortion. Share Cite Improve this answer Follow answered 20 hours ago Michael Lew's user avatar Michael LewMichael Lew 14.6k22 gold badges3939 silver badges6060 bronze badges $\endgroup$ Add a comment | Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The reputation requirement helps protect this question from spam and non-answer activity. Not the answer you're looking for? Browse other questions tagged * maximum-likelihood * likelihood or ask your own question. * Featured on Meta * Site maintenance - Saturday, February 24th, 2024, 14:00 - 22:00 UTC (9 AM - 5... * Upcoming privacy updates: removal of the Activity data section and Google... Linked 663 What is the difference between "likelihood" and "probability"? 26 What does "likelihood is only defined up to a multiplicative constant of proportionality" mean in practice? 10 What is likelihood actually? 9 Why we always put log() before the joint pdf when we use MLE(Maximum likelihood Estimation)? Related 0 Finding maximum likelihood estimates of parameters of multiple normal populations 1 (R code provided) one-on-one correspondence bet. shapes of sampling distribution and Likelihood function 4 python computing likelihood causing exp overflow 2 How to choose between mean squared error and likelihood? 1 Confusion about the optimized parameters when doing maximum likelihood 2 How to reject a distribution given a sample? Hot Network Questions * How much time do we need to decide if the replacement hydraulic fluid is inappropriate for our brake system or not? * Compute the phat-fingered double-bit-flip distance * How can I keep the form when I use Expand[2a+3b]^5 and apply to all elements of a list? * Give multiple users transparent ownership of directory and contents * How to prevent accidental execution of potentially harmful commands (e.g. reboot) * Determining why my proof depends on the axiom of choice * How to make a circle mesh with smooth coronas at different heights? * Markets in Germany with a large selection of seafood * Is there any satellite that uses LOX as oxidizer? * Possible inconsistencies of the Hamiltonian in the two-body problem * Confusion over Microfacet-based BRDFs and Normal Distribution Functions * Hiding a star cluster * Ordering with repeats * White is missing * What are the techniques that delay the formation of shock wave over wing? * What is causing this impossibly long return time in AC Brotherhood on the PS3? * Remove all text files with non-US-ASCII text encoding from current folder on Linux * What spares were taken on Apollo missions, and what was left behind? The question of gloves * How can the ECtHR judgement on encryption be reconciled with the UK's Online Safety act? * Is it bad practice to cite online news articles in solely because it's not a "reputable" source (i.e journal articles or even books)? * Does psychophysical harmony strongly point toward theism? * Spot The Difference * Measuring a voltage signal ranging from -2 to 2 volts into a 16 bit ADC circuit * Why is the key typically the first and/or last note (or chord) of a song? more hot questions Question feed Subscribe to RSS Question feed To subscribe to this RSS feed, copy and paste this URL into your RSS reader. [https://stats.stacke] * Cross Validated * Tour * Help * Chat * Contact * Feedback Company * Stack Overflow * Teams * Advertising * Collectives * Talent * About * Press * Legal * Privacy Policy * Terms of Service * Cookie Settings * Cookie Policy Stack Exchange Network * Technology * Culture & recreation * Life & arts * Science * Professional * Business * API * Data * Blog * Facebook * Twitter * LinkedIn * Instagram Site design / logo (c) 2024 Stack Exchange Inc; user contributions licensed under CC BY-SA. rev 2024.2.16.5008