Subj : Re: Random Subset of Linked List To : comp.programming From : cri Date : Tue Sep 13 2005 03:57 pm On Tue, 13 Sep 2005 08:50:47 -0500, mschaef@fnord.io.com (MSCHAEF.COM) wrote: > >I have a question that seems like it might be appropriate for this group: >Given a linked list, I'm trying to pick a random subset with a specific >number of items. The constraint is that all elements in the original list >have the same chance of ending up in the final subset. > >Is there a standard way people accomplish this? So far, what I've come up >with is this: > >1. Generate a list of random, unique indices into the original list: > a. Start with an empty list of indices > b. Generate a random index > c. If the index is not in the list of indices, add it > d. If the list of indices is long enough, go back to a. > >2. Sort the list of indices. (It can also be kept sorted, to make > step 1c faster). > >3. Traverse the list of indices and the original list in parallel, > creating the new list of the elements denoted by the list of indices. > >I like this approach in that it seems likely to produce an unbiased >result. However, step 1 makes me uncomfortable: there's no guarantee that >the RNG will produce enough unique random numbers in a reasonable amount >of time. This is particularly true if the subset is to be similarly sized >to the original list. > >My second approach is basically to copy the list into an array, >randomly shuffle it 'enough', and pull off the first n elements into >another list. That's not too bad, although for large initial lists there's >a lot of copying, and for small subsets there's a lot of extra work >involved. I'm also not quite sure how to define 'enough' shuffling. > >Any thoughts? suggestions? See http://home.tiac.net/~cri/2004/selectk.html Richard Harter, cri@tiac.net http://home.tiac.net/~cri, http://www.varinoma.com I started out in life with nothing. I still have most of it left. .