Subj : implementing zipf distribution?
To   : comp.theory,comp.programming
From : Digital Puer
Date : Sat Aug 06 2005 06:46 pm

I am writing a simulation and would like to implement
a zipf distribution. I could use some help.

Suppose there are 20 files dpulicated across 100 servers,
where each server can hold 5 unique files. I would like
to distribute the 20 files among the servers with a
zipf distribution. I think that means that the i-th
most popular file appears with probability 1/i^a,
where i=1 is the most popular and a is supposed to be
near 1 (say, it is 1).

Each server needs to select 5 files. Is the following
a good algorithm?

- create a vector
- fill the vector with integers where each integer is
in the range [1,20] representing the 20 files
- integer "1" appears, say, 1000 times
- integer "2" appears 1/2 as many times as first (500)
- integer "3" appears 1/3 as many times as first (333)
- integer "20" appears 1/20 as many times as first (50)
- each server then randomly selects 5 integers from this
vector without replacement. Also, no duplicates are allowed
for a given server, so keep choosing (and putting back)
until 5 unique integers are taken


Is the above correct?

The problem I see is that the most popular file (file "1")
is selected with probability 1000/(1000+500+333+...+50),
which doesn't jive with the 1/i definition.

It also appears that there has to be a starting numer of
appearances for the most popular item (in this case, 1000)
as a parameter to the problem.


Thanks for any help.

.