Subj : Re: Generating an Index To : comp.programming From : Wavemaker Date : Fri Sep 09 2005 03:30 am "Willem" wrote: > Wavemaker wrote: > ) Hmm, I've thought about this and was about to suggest using > ) 10101010... as a reference value and calculating the distance of > ) each item in reference to that value. > > There is no reason whatsoever why 10101010... would be a better > reference value than 00000000... would be. To see this, you have to > realise that XORing all numbers with a given pattern will not change > the relative distances of any two numbers. You're right. I think I chose that value because it intuitively seemed better without having a concrete reason. Assuming this approach has merit, something I'm not sure of at all, then any value would do as a reference value. Correct? The reason I thought using a reference value might help was Mary's example of the two values, 01111111 and 11111111 (give or take a few bits). They have a distance of 1. Assuming those two items are located thousands of items apart, sorting them by distance to a reference value brings them closer together (assuming a reference value of 00000000). But as you've pointed out, moving values around like this causes some values that should be close together to move farther apart. If you take the values 11110000 and 00001111 and sort them by distance to a reference value of 00000000, there is a problem because they both have the same distance to the reference value but have a maximum distance to each other. But according to the sort, they should be side-by-side. Whew, I'm back to thinking that this problem may not be solvable. At least not by me. This problem interested me enough to write a small program to try out various approaches. One thing I noticed is that sorting by distance to a reference value brought duplicate values closer together, which is what you would expect. Actually, since my data set was small, it always brought them right next to each other, but in a larger data set I don't see that as guaranteed. Duplicate items would have a distance of 0 between them, of course. If there are a lot of duplicates and the goal is once having found a particular item in the data set to then find the item with the smallest distance from it, which is 0, this approach may have merit. Maybe. Mary will have to explore this further to see if this approach is actually useful. .