Subj : Random Subset of Linked List
To   : comp.programming
From : mschaef
Date : Tue Sep 13 2005 09:50 am


I have a question that seems like it might be appropriate for this group: 
Given a linked list, I'm trying to pick a random subset with a specific 
number of items. The constraint is that all elements in the original list 
have the same chance of ending up in the final subset. 

Is there a standard way people accomplish this? So far, what I've come up 
with is this:

1. Generate a list of random, unique indices into the original list:
   a. Start with an empty list of indices
   b. Generate a random index
   c. If the index is not in the list of indices, add it
   d. If the list of indices is long enough, go back to a.

2. Sort the list of indices. (It can also be kept sorted, to make
   step 1c faster).

3. Traverse the list of indices and the original list in parallel,
   creating the new list of the elements denoted by the list of indices.

I like this approach in that it seems likely to produce an unbiased 
result. However, step 1 makes me uncomfortable: there's no guarantee that 
the RNG will produce enough unique random numbers in a reasonable amount 
of time. This is particularly true if the subset is to be similarly sized 
to the original list.

My second approach is basically to copy the list into an array, 
randomly shuffle it 'enough', and pull off the first n elements into 
another list. That's not too bad, although for large initial lists there's 
a lot of copying, and for small subsets there's a lot of extra work 
involved. I'm also not quite sure how to define 'enough' shuffling.

Any thoughts? suggestions?

Thanks,
Mike
-- 
http://www.mschaef.com

.