[HN Gopher] Histograms for Probability Density Estimation: A Primer
       ___________________________________________________________________
        
       Histograms for Probability Density Estimation: A Primer
        
       Author : vvanirudh
       Score  : 21 points
       Date   : 2024-04-08 22:06 UTC (3 days ago)
        
 (HTM) web link (vvanirudh.github.io)
 (TXT) w3m dump (vvanirudh.github.io)
        
       | Bostonian wrote:
       | If the data is continuous, use kernel density estimation (KDE)
       | instead of histograms to visualize the probability density, since
       | KDE will give a smoother fit. A similar idea is to fit a mixture
       | of normals -- there are numerous R packages for this and
       | sklearn.mixture.GaussianMixture in SciPy.
        
         | vvanirudh wrote:
         | Yep! The next post would be on Kernel density estimation --
         | wanted to start from histograms as they are still a useful tool
         | in 1-D and 2-D density estimation, and you don't have to store
         | the data either (unlike KDE)
        
           | Bostonian wrote:
           | I should have read to the end of your post:
           | 
           | 'I will describe a very popular nonparametric method, Kernel
           | Density Estimation, that also follows strategy 1 and is much
           | more scalable to higher dimensions than histograms.'
        
             | vvanirudh wrote:
             | Haha no worries!
        
       | bagrow wrote:
       | The best way to compute the empirical CDF (ECDF) is by sorting
       | the data:                   N = len(data)         X =
       | sorted(data)         Y = np.arange(N)/N         plt.plot(X,Y)
       | 
       | Technically, you should plot this with `plt.step`.
        
         | andrewla wrote:
         | scipy even has a built-in method (scipy.stats.ecdf) for doing
         | exactly this.
        
       | sobriquet9 wrote:
       | Why estimate PDF through histogram then convert to CDF, when one
       | can estimate CDF directly? Doing so also avoids having to choose
       | bin width that can have substantial impact.
        
         | andrewla wrote:
         | Agreed -- very odd to use a parameter (bin width) in a
         | nonparametric estimation. Just use the raw data. In numerical
         | analysis, broadly speaking, integrals are stable while
         | derivatives are wild; an empirical cdf is a nice smooth
         | integral of the messy pdf.
        
       ___________________________________________________________________
       (page generated 2024-04-11 23:01 UTC)