https://grossack.site/2023/11/08/37-median.html
Skip to main content Site logo
Chris Grossack /
Math/Music. They/Them.
x Blog About
A truly incredible fact about the number 37
08 Nov 2023 - Tags: sage
So I was on math stackexchange the other day, and I saw a cute post
looking for a book which lists, for many many integers, facts that
Ramanujan could have told Hardy if he'd taken a cab other than 1729.
A few days ago OP answered their own question, saying that the book
in question was Those Fascinating Numbers by Jean-Marie De Koninck. I
decided to take a glance through it to see what kinds of facts lie
inside (and also to see just how many integers are covered!). Not
only was I overwhelmed by the number of integers and the number of
facts about them, the preface already includes one of the single
wildest facts I've ever heard, and I have to talk about it here!
Here's a direct quote from the preface:
37, the median value for the second prime factor of an integer;
thus the probability that the second prime factor of an integer
chosen at random is smaller than 37 is approximately $\frac{1}{2}
$;
My jaw was on the floor when I read this, haha. First it sounded
totally unbelievable, since 37 is a tiny number in the grand scheme
of things. Then it started to sound slightly more plausible... After
all, about half of all integers have $2$ as their smallest prime
factor. It makes sense that smaller primes should be more frequent
among the smallest factors of numbers! But then I thought "how can
you possibly prove this!?". I'm not much of an analytic number
theorist^1, but I know that they have good estimates on a lot of
facts like this. I decided it would be fun to try and find and
understand a proof of this fact, and also write some sage code to
test it!
So then let's go ahead and do it ^_^
---------------------------------------------------------------------
First, I think, the sage code. I want to know if this really works!
"Obvoiusly" there's no uniform distribution on the natural numbers,
so what does it even mean to choose a "random" one? The way the
number theorists usually solve this problem is by fixing a large
number $N$ and looking at the probabilities when you pick a random
number between $1$ and $N$. Then you look at the $N \to \infty$ limit
of these probabilities.
So for us, we'll want to first fix a large number $N$ and then work
with numbers $\leq N$. For $N$ kind of small, we can just find the
second prime factor of each number $\leq N$ and check the median!
When I first ran this code, it honestly felt like magic, haha. What
the hell is going on here!?
---------------------------------------------------------------------
The key idea, found in a paper of De Koninck and Tenenbaum^2, is that
we can compute the density of numbers whose second prime is $p$
(which the authors denote $\lambda_2(p)$) by cleverly using the ideas
in the Sieve of Eratosthenes!
Let's do a simple example to start. What fraction of numbers have $5$
as their second prime? In the language of the paper, what is $\
lambda_2(5)$?
Well it's not hard to see that the numbers whose second prime is $5$
are those numbers whose prime factorization looks like
\[2^a 3^0 5^b \cdots\]
or
\[2^0 3^a 5^b \cdots\]
so we need to count the density of numbers of these forms.
But a number is of the first form ($2^a 3^0 5^b \cdots$) if and only
if it has a factor of $2$, a factor of $5$, and no factors of $3$.
To bring this back to elementary school^3, we can highlight all of
our numbers with a factor of $2$
[number-grid-evens]
numbers with no factors of $3$
[number-grid-no-threes]
and numbers with a factor of $5$
[number-grid-fives]
Then the numbers whose prime factorization starts $2^a 3^0 5^b \
cdots$ are exactly the numbers highlighted by all three of these
colors!
[number-grid-intersection]
It's intuitively clear that $\frac{1}{2}$ the numbers are blue, $\
frac{2}{3}$ are orange, and $\frac{1}{5}$ are pink. So taken
together, $\frac{1}{2} \cdot \frac{1}{5} \cdot \frac{2}{3} = \frac{1}
{15}$ of numbers are of this form!
So now we have our hands on the density of numbers of the form $2^a 3
^0 5^b$, but this is only one of two ways that $5$ can be the second
smallest prime. A similar computation shows that $\left ( 1 - \frac
{1}{2} \right ) \cdot \frac{1}{3} \cdot \frac{1}{5} = \frac{1}{30}$
of numbers are of the form $2^0 3^a 5^b$.
It's easy to see that these sets are disjoint, so their densities
add, and $\frac{1}{15} + \frac{1}{30} = \frac{1}{10}$ numbers have
$5$ as their second smallest factor!
Now with the warm-up out of the way, let's see how we can compute $\
lambda_2(p)$ for our favorite prime $p$!
We'll play exactly the same game. How can $p$ be the second smallest
prime? Exactly if the prime factorization looks like
\[p^b q^a \prod_{q \neq r \lt p} r^0\]
for some $q \lt p$.
But we can count these densities as before! For each choice of $q$,
we know that $\frac{1}{p}$ numbers are multiples of $p$, $\frac{1}{q}
$ are multiples of $q$, and for each $r$ we know $\left (1 - \frac{1}
{r} \right )$ numbers are not multiples of $r$! For each $q$, then,
we want to land in the intersection of all of these sets, then we
want to sum over our choices of $q$. Taken together, we see that
The density of numbers whose second prime is $p$ is
\[\lambda_2(p) = \sum_{q \lt p} \frac{1}{p} \frac{1}{q} \prod_{q \neq
r \lt p} \left ( 1 - \frac{1}{r} \right )\]
We can rearrange this to
\(\displaystyle \lambda_2(p) = \frac{1}{p} \left [ \prod_{q \lt p} \
left ( 1 - \frac{1}{q} \right ) \right ] \sum_{q \lt p} \frac{1}{q} \
left ( 1 - \frac{1}{q} \right )^{-1}\)
As a cute exercise, write $\lambda_k(p)$ for the density of numbers
whose $k$th prime is $p$.
De Koninck and Tenenbaum mention in passing that
\[\displaystyle \lambda_k(p) = \frac{1}{p} \left [ \prod_{k \lt p} \
left ( 1 - \frac{1}{q} \right ) \right ] s_{k-1}(p)\]
where $s_j(p) = \sum \frac{1}{m}$ is a sum over all $m$ who have
exactly $j$ prime factors, all of which are $\lt p$.
Can you prove that this formula is correct^4?
---------------------------------------------------------------------
But remember the goal of all this! We want to know the prime \(p^*\)
so that half of all numbers have their second prime \(\leq p^*\).
That is, so that the sum of densities
\[\lambda_2(2) + \lambda_2(3) + \lambda_2(5) + \ldots + \lambda_2(p^
*) \approx \frac{1}{2}.\]
But we can implement $\lambda_2(-)$ and just check for which prime
this happens!
Again we see that $37$ is the prime where roughly half of all numbers
have something $\leq 37$ as their first prime! So we've proven that
$37$ is the median second prime!
Also, this shows that we expect the actual density to be $\approx
.5002$. If we set $N = 10^7$ in the code from the first half^5 to get
a better approximation, we get $.5002501$, which is remarkably close
to the truth!
As another cute exercise - using the ideas in this post, can you
compute the median third prime?
As a (much) harder exercise^6, can you get asymptotics for how the
median $k$th prime grows as a function of $k$?
---------------------------------------------------------------------
Thanks for hanging out, all! This was a really fun post to write up,
and I'm really excited to share it with everybody! This fact about
$37$ was all I could think about for like a week, haha.
I have more blog posts coming, of course, so I'll see you all soon!
Stay safe, and stay warm ^_^
---------------------------------------------------------------------
1. Absolutely the understatement of the year -
2. Sur la loi de repartition du k-ieme facteur premier d'un entier
Yes, this paper is in french, but it's really not so hard to
read, especially with liberal use of google translate. Though if
you want to avoid reading it, I've done the hard work for you,
and everything in this blog post is in english.
It also wasn't too hard to find this paper, thankfully. It's
mentioned in a footnote in the entry for $37$ in Those
Fascinating Numbers, so I had a decent starting point. -
3. I literaly got the base image by googling "grid of numbers high
res" and clicking the first result, which was for elementary
schoolers -
4. It might be helpful to remember a generating function trick that
shows up fairly often (for instance in partitions and the riemann
zeta function):
\[\sum \frac{1}{n} = \prod_p \left ( 1 - \frac{1}{p} \right )^
{-1}\]
Don't worry that this sum diverges for now. Just take note of why
these two sides are equal. You should expand each term of the
right hand side as a geometric series, then check what happens
when you foil. -
5. (and run it locally, since factoring numbers that big takes so
long that the online sagecell times out) -
6. If you read french, the De Koninck and Tenenbaum paper we've been
referencing all post (Sur la loi de repartition du k-ieme facteur
premier d'un entier) is actually all about analyzing these
asymptotics!
If we write \(p_k^*\) for the median $k$th prime, then they show:
\[\log \log p_k^* = k - b + O \left ( \frac{1}{\sqrt{k}} \right )
\]
where $b = \frac{1}{3} + \gamma - \sum_p \left ( \log ((1-1/p)^
{-1}) - 1/p \right )$ and $\gamma$ is the Euler-Mascheroni
Constant. -
(c) Chris Grossack, 2020