Subj : Re: Generating an Index To : comp.programming From : aslanski2002 Date : Fri Sep 09 2005 05:32 am Mary yazdi: > Hi, I need some assistance with a query. To be honest, I'm not even > sure it can be done. > > I'll try to keep the information limited to only what's relevant to > what I have and what I am trying to achieve with this query. > > I have a table that contains around 100,000 records. For the sake of > this discussion, assume just two columns: > > ID Data > 1 000000000000 > 2 010000000000 > 3 011111111111 > 4 101100011101 > 5 110100011011 > 6 111100000000 > 7 111100000111 > 8 111100010111 > 9 111100011011 > 10 111111111111 > > > > The Data column contains only 1's and 0's. > The Data column is a text column, not numeric. > The Data column is actually 255 chars long. (I limited it above to 12 > for this example only) > Duplicates on the Data column are allowed and do exist. > > > With 100,000 records, you would note that in the above example record > 10 would actually be record 100,000. > > > My aim is to somehow sort the data (by creating a third column) so that > the records are in order of "string distance" or similar. In other > words, so that similar strings are located next to (or as close as > possible to) each other. > > > For example, taking the data above: > > The "distance" between record 8 and record 9 is 2 (ie Only two > positions in the Data are not the same) > The "distance" between record 3 and record 4 is 6 > > However, the "distance" between record 3 and record 10 is only 1, > but sorted in the normal fashion, there would be some 99,990 records > between them. > > > I was hoping that as I add records to this table I could calculate a > number or a code or something to create a third column that could be > indexed. Accessing records using this index would give me the records > in order of "String Distance" or similar. Hmm. Based on your "string distance" definition it can have a value in the range 0 to . So you want to sort some values in the range 0 to 255, right? However this values seem to be relative values. Take a row with all zeroes in data caloumn. You can create "string distance" values for your 100,000 records in the range [0,255]. Now take another with all one's. Now you have another set of 100,000 records in the range [0,255]. Let make it simple. Let data column be 2 digits. So we can have 4 distinc values. For 00 01, 10 (SD=1) 11 (SD=2) For 01 10, 00 (SD=1) 10 (SD=2) For 10 11, 00 (SD=1) 01 (SD=2) For 11 10, 01 (SD=1) 00 (SD=2) How would you sort them? > > I have looked up functions such as "Levenshtein - Edit String > Distance", which is fine when I have to strings to compare. I can't > see it helping in generating an index though. > > I hope I have been clear in my explanation. > > I would really appreciate any comments or discussion that could help me > achieve this. > > Thanks for your time, > > MJ .