Subj : Re: Random Subset of Linked List
To   : comp.programming
From : cri
Date : Tue Sep 13 2005 03:57 pm

On Tue, 13 Sep 2005 08:50:47 -0500, mschaef@fnord.io.com (MSCHAEF.COM)
wrote:

>
>I have a question that seems like it might be appropriate for this group: 
>Given a linked list, I'm trying to pick a random subset with a specific 
>number of items. The constraint is that all elements in the original list 
>have the same chance of ending up in the final subset. 
>
>Is there a standard way people accomplish this? So far, what I've come up 
>with is this:
>
>1. Generate a list of random, unique indices into the original list:
>   a. Start with an empty list of indices
>   b. Generate a random index
>   c. If the index is not in the list of indices, add it
>   d. If the list of indices is long enough, go back to a.
>
>2. Sort the list of indices. (It can also be kept sorted, to make
>   step 1c faster).
>
>3. Traverse the list of indices and the original list in parallel,
>   creating the new list of the elements denoted by the list of indices.
>
>I like this approach in that it seems likely to produce an unbiased 
>result. However, step 1 makes me uncomfortable: there's no guarantee that 
>the RNG will produce enough unique random numbers in a reasonable amount 
>of time. This is particularly true if the subset is to be similarly sized 
>to the original list.
>
>My second approach is basically to copy the list into an array, 
>randomly shuffle it 'enough', and pull off the first n elements into 
>another list. That's not too bad, although for large initial lists there's 
>a lot of copying, and for small subsets there's a lot of extra work 
>involved. I'm also not quite sure how to define 'enough' shuffling.
>
>Any thoughts? suggestions?

See http://home.tiac.net/~cri/2004/selectk.html 



Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
I started out in life with nothing.
I still have most of it left.

.