Subj : Re: Random Subset of Linked List To : comp.programming From : Roger Willcocks Date : Wed Sep 14 2005 11:05 am "MSCHAEF.COM" wrote in message news:_ICdnf414tiqRLveRVn-rw@io.com... > > I have a question that seems like it might be appropriate for this group: > Given a linked list, I'm trying to pick a random subset with a specific > number of items. The constraint is that all elements in the original list > have the same chance of ending up in the final subset. > > Is there a standard way people accomplish this? So far, what I've come up > with is this: > > 1. Generate a list of random, unique indices into the original list: > a. Start with an empty list of indices > b. Generate a random index > c. If the index is not in the list of indices, add it > d. If the list of indices is long enough, go back to a. > > 2. Sort the list of indices. (It can also be kept sorted, to make > step 1c faster). > > 3. Traverse the list of indices and the original list in parallel, > creating the new list of the elements denoted by the list of indices. > > I like this approach in that it seems likely to produce an unbiased > result. However, step 1 makes me uncomfortable: there's no guarantee that > the RNG will produce enough unique random numbers in a reasonable amount > of time. This is particularly true if the subset is to be similarly sized > to the original list. > > My second approach is basically to copy the list into an array, > randomly shuffle it 'enough', and pull off the first n elements into > another list. That's not too bad, although for large initial lists there's > a lot of copying, and for small subsets there's a lot of extra work > involved. I'm also not quite sure how to define 'enough' shuffling. > > Any thoughts? suggestions? Walk down the list either accepting or rejecting each element as you traverse it. Assuming you want to pick K elements from a list length N, the first element should be selected with a probability of K/N. The second element should be selected with a probability of (K-1)/(N-1) or K/(N-1) depending on whether you selected the first element or not. And so on. The final element will be selected with probability 0/1 or 1/1. This is clearly O(N). -- Roger .