Subj : Re: Generating an Index
To   : comp.programming
From : Mary
Date : Thu Sep 08 2005 02:26 pm

Hi Wavemaker,

Yes, you are correct in your understanding.

I was hoping that I could create a third column in the database that
contained a "weighting" factor (or something like that) based on the
"Data" column, so that when I sorted the table on this new column I
would find records where the distance between them is lowest. Note that
it's not as simple as counting up the 1's...eg:

101010101010 and
010101010101
Both have the same number of 1's, but the distance is 12, ie 100%
different.

Taking your data from your post, where you have the 4 records, it would
be reasonable (and I would expect) that there would be several records
in a row where they differ from each other by a common amount...

1 000000000000
2 010000000000
3 001000000000

Each has a distance of 1 from each other.
Consider that in my current situation, with 100,000 records:

1  000000000000
..
..
55   011111111111
..
..
100,000   111111111111


As you can see, record 55 has a distance of 1 when compared to record
100,000. By not having any sort of "weighting factor", it would take
ages for me to find this match.
All I want to do is reduce the number of "reads" I have to do to find
these matches...if I could just bring record 100,000 a bit closer by
sorting on a third column. Even if I have to read 50-100 records to
find it would be much better than having to read over 90,000

MJ

.