hngopher.com

       [HN Gopher] The Surprising Predictability of Long Runs (2012) [pdf]
       ___________________________________________________________________
        
       The Surprising Predictability of Long Runs (2012) [pdf]
        
       Author : alexmolas
       Score  : 13 points
       Date   : 2024-10-11 14:40 UTC (8 hours ago)
        
 (HTM) web link (www.csun.edu)
 (TXT) w3m dump (www.csun.edu)
        
       | nuancebydefault wrote:
       | I once saw on some website a chart with distribution of flat tire
       | events. Often one does not encounter it in 10 years and suddenly
       | 2 or 3 times in a year. Mathematically, chances of such
       | distribution are quite high.
        
       | fastaguy88 wrote:
       | One of the major breakthroughs in Bioinformatics was the
       | recognition that local similarity scores (which can be thought of
       | as runs of positive sequence similarity) are extreme-value
       | distributed.[0] The logic of that discovery uses almost exactly
       | the same mathematical argument as this paper [1], indeed I
       | recognized some of the same equations.
       | 
       | It is difficult to overstate the importance of this discovery for
       | biology, as today, the vast vast majority of protein functional
       | inferences for newly sequenced genomes are based on the
       | statistics of long runs of sequence similarity.
       | 
       | [0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
       | [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264
        
       | wenc wrote:
       | This is an interesting finding. There are two takeaways from the
       | paper.
       | 
       | 1. The length of streaks L for an independent Bernoulli process
       | with success probability p (with q = 1-p) over n trials can
       | easily be calculated.
       | 
       | L = log_{1/p} (n*q)
       | 
       | 2. This estimate becomes more accurate as p decreases. Because
       | the distribution of L is an extreme value distribution which gets
       | more concentrated as p decreases.
       | 
       | This means for low values of p, L becomes more predictable and
       | accurate.
       | 
       | I don't know how this result will change my life, but at least
       | now I know that I can predict streaks if I know p.
        
       ___________________________________________________________________
       (page generated 2024-10-11 23:01 UTC)