Subj : Re: any function to handle this kind of counting? To : comp.programming From : Randy Date : Fri Jul 29 2005 07:47 pm Ross wrote: > Dear all, > > i have generated a 2000 X 1 vector, e.g. > > > 1 > 3 > 2 > 1 > 1 > 3 > 2 > 1 > 1 > 4 > ... > > and this vector indeed has a corresponding vector, e.g. > > A > A > A > A > C > C > E > E > ... > > and then i want to count, say, for "1", there are two A's, one E, ... > > it seems that a for loop can do the job but as we don't know how many > instances there are, i.e. don't know A, C, E and then what will occur in > this corresponding vector, is there any function which is devoted to handle > this kind of counting problem? thx!! Is there a function? No. The reason is that the size of your function's domain is unknown (you don't know how much state is required to represent the range of possible values in the 2nd vector). Therefore, you can't define a static closed-form function that cleanly maps your domain (1st vector OP 2nd vector) into its range (the counts). Likewise, you don't know how big the function's range data type needs to be. But just because there's no pretty solution doesn't mean there's no ugly solution. You can devise a number of ways to deal with this: 1) Use a static count data structure that's big enough to hold all possible value pairs (from 1st and 2nd vector), 2) Use a dynamic count data structure, that grows as new element values are encountered, 3) Make two passes through the vectors. The first counts up the number of different possible values in each of the vectors; then you malloc a count data structure that's the right size (NxM); then you pass through the vectors again to do the count. Option #1 might look like this: For example, n this case, if your 1st vector used single numerals and your 2nd vector used characters, you might dimension the 2D array as 10x256, and your code would look like: #define VECLEN 2000 { char vec1[VECLEN], vec2[VECLEN]; int counts[10][256]; int c; // first zero out the count array "counts" // then increment the entry in the 2D count array that corresponds // to pair of values from vec1 and vec2 at offset 1:VECLEN for (c = 0; c < VECLEN; c++) counts[vec1[c]][vec2[c]]++; } For option #2, a dynamic data structure, you could use a 2D linked list, or perhaps an ordered binary tree that you rebalance as necessary (like a skip list). Either way, rebuilding the dynamic data structure as you go is gonna be slow. Or for option #3, you make two passes through the data. In addition to the malloced 2D count array, I would also create a pair of malloced index arrays that you can use to efficiently index the 1st vector's and the 2nd vector's values into simple numeric indexes (M or N), to be used as offsets into the MxN count array, based on the value of each datum/element (see below). Then you traverse the vectors again, 1 to 2000, incrementing the count array using a TRIPLE index scheme, something like: #define VECLEN 2000 { int *idx1, *idx2; char vec1[VECLEN], vec2[VECLEN]; int **counts; int c; // first zero out the count array "counts" // then make a pass through the vectors to determine how many // different values are present in each vector // then malloc a pair of arrays that are big enough to hold all of the // different values in each of the vectors: idx1 and idx2 // then malloc a 2D count array big enough to hold all those counts. // Dimension is as: counts[idx1][idx2] // then increment the entry in the 2D count array that corresponds // to pair of values from vec1 and vec2 at offset 1:VECLEN for (c = 0; c < VECLEN; c++) counts[idx1[vec1[c]]][idx2[vec2[c]]]++; } Randy -- Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu .