[HN Gopher] The Surprising Predictability of Long Runs (2012) [pdf]
___________________________________________________________________
The Surprising Predictability of Long Runs (2012) [pdf]
Author : alexmolas
Score : 13 points
Date : 2024-10-11 14:40 UTC (8 hours ago)
(HTM) web link (www.csun.edu)
(TXT) w3m dump (www.csun.edu)
| nuancebydefault wrote:
| I once saw on some website a chart with distribution of flat tire
| events. Often one does not encounter it in 10 years and suddenly
| 2 or 3 times in a year. Mathematically, chances of such
| distribution are quite high.
| fastaguy88 wrote:
| One of the major breakthroughs in Bioinformatics was the
| recognition that local similarity scores (which can be thought of
| as runs of positive sequence similarity) are extreme-value
| distributed.[0] The logic of that discovery uses almost exactly
| the same mathematical argument as this paper [1], indeed I
| recognized some of the same equations.
|
| It is difficult to overstate the importance of this discovery for
| biology, as today, the vast vast majority of protein functional
| inferences for newly sequenced genomes are based on the
| statistics of long runs of sequence similarity.
|
| [0] https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
| [1] https://www.pnas.org/doi/epdf/10.1073/pnas.87.6.2264
| wenc wrote:
| This is an interesting finding. There are two takeaways from the
| paper.
|
| 1. The length of streaks L for an independent Bernoulli process
| with success probability p (with q = 1-p) over n trials can
| easily be calculated.
|
| L = log_{1/p} (n*q)
|
| 2. This estimate becomes more accurate as p decreases. Because
| the distribution of L is an extreme value distribution which gets
| more concentrated as p decreases.
|
| This means for low values of p, L becomes more predictable and
| accurate.
|
| I don't know how this result will change my life, but at least
| now I know that I can predict streaks if I know p.
___________________________________________________________________
(page generated 2024-10-11 23:01 UTC)