[HN Gopher] Histograms for Probability Density Estimation: A Primer
___________________________________________________________________
Histograms for Probability Density Estimation: A Primer
Author : vvanirudh
Score : 21 points
Date : 2024-04-08 22:06 UTC (3 days ago)
(HTM) web link (vvanirudh.github.io)
(TXT) w3m dump (vvanirudh.github.io)
| Bostonian wrote:
| If the data is continuous, use kernel density estimation (KDE)
| instead of histograms to visualize the probability density, since
| KDE will give a smoother fit. A similar idea is to fit a mixture
| of normals -- there are numerous R packages for this and
| sklearn.mixture.GaussianMixture in SciPy.
| vvanirudh wrote:
| Yep! The next post would be on Kernel density estimation --
| wanted to start from histograms as they are still a useful tool
| in 1-D and 2-D density estimation, and you don't have to store
| the data either (unlike KDE)
| Bostonian wrote:
| I should have read to the end of your post:
|
| 'I will describe a very popular nonparametric method, Kernel
| Density Estimation, that also follows strategy 1 and is much
| more scalable to higher dimensions than histograms.'
| vvanirudh wrote:
| Haha no worries!
| bagrow wrote:
| The best way to compute the empirical CDF (ECDF) is by sorting
| the data: N = len(data) X =
| sorted(data) Y = np.arange(N)/N plt.plot(X,Y)
|
| Technically, you should plot this with `plt.step`.
| andrewla wrote:
| scipy even has a built-in method (scipy.stats.ecdf) for doing
| exactly this.
| sobriquet9 wrote:
| Why estimate PDF through histogram then convert to CDF, when one
| can estimate CDF directly? Doing so also avoids having to choose
| bin width that can have substantial impact.
| andrewla wrote:
| Agreed -- very odd to use a parameter (bin width) in a
| nonparametric estimation. Just use the raw data. In numerical
| analysis, broadly speaking, integrals are stable while
| derivatives are wild; an empirical cdf is a nice smooth
| integral of the messy pdf.
___________________________________________________________________
(page generated 2024-04-11 23:01 UTC)