Subj : Re: any function to handle this kind of counting?
To   : comp.programming
From : Randy
Date : Fri Jul 29 2005 07:47 pm

Ross wrote:
> Dear all,
> 
> i have generated a 2000 X 1 vector, e.g.
> 
> 
> 1
> 3
> 2
> 1
> 1
> 3
> 2
> 1
> 1
> 4
> ...
> 
> and this vector indeed has a corresponding vector, e.g.
> 
> A
> A
> A
> A
> C
> C
> E
> E
> ...
> 
> and then i want to count, say, for "1", there are two A's, one E, ...
> 
> it seems that a for loop can do the job but as we don't know how many 
> instances there are, i.e. don't know A, C, E and then what will occur in 
> this corresponding vector, is there any function which is devoted to handle 
> this kind of counting problem? thx!! 

Is there a function?  No.  The reason is that the size of your
function's domain is unknown (you don't know how much state is required
to represent the range of possible values in the 2nd vector).
Therefore, you can't define a static closed-form function that cleanly
maps your domain (1st vector OP 2nd vector) into its range (the counts).
 Likewise, you don't know how big the function's range data type needs
to be.

But just because there's no pretty solution doesn't mean there's no ugly
solution.  You can devise a number of ways to deal with this:

1) Use a static count data structure that's big enough to hold all
possible value pairs (from 1st and 2nd vector),

2) Use a dynamic count data structure, that grows as new element values
are encountered,

3) Make two passes through the vectors.  The first counts up the number
of different possible values in each of the vectors; then you malloc a
count data structure that's the right size (NxM); then you pass through
the vectors again to do the count.

Option #1 might look like this:

For example, n this case, if your 1st vector used single numerals and
your 2nd vector used characters, you might dimension the 2D array as
10x256, and your code would look like:

#define VECLEN 2000
{
char vec1[VECLEN], vec2[VECLEN];
int counts[10][256];
int c;

  // first zero out the count array "counts"

  // then increment the entry in the 2D count array that corresponds
  //   to pair of values from vec1 and vec2 at offset 1:VECLEN

  for (c = 0; c < VECLEN; c++)
    counts[vec1[c]][vec2[c]]++;
}

For option #2, a dynamic data structure, you could use a 2D linked list,
or perhaps an ordered binary tree that you rebalance as necessary (like
a skip list).  Either way, rebuilding the dynamic data structure as you
go is gonna be slow.

Or for option #3, you make two passes through the data.  In addition to
the malloced 2D count array, I would also create a pair of malloced
index arrays that you can use to efficiently index the 1st vector's and
the 2nd vector's values into simple numeric indexes (M or N), to be used
as offsets into the MxN count array, based on the value of each
datum/element (see below).

Then you traverse the vectors again, 1 to 2000, incrementing the count
array using a TRIPLE index scheme, something like:

#define VECLEN 2000
{
int *idx1, *idx2;
char vec1[VECLEN], vec2[VECLEN];
int **counts;
int c;

  // first zero out the count array "counts"

  // then make a pass through the vectors to determine how many
  //   different values are present in each vector

  // then malloc a pair of arrays that are big enough to hold all of the
  //   different values in each of the vectors: idx1 and idx2

  // then malloc a 2D count array big enough to hold all those counts.
  //   Dimension is as: counts[idx1][idx2]

  // then increment the entry in the 2D count array that corresponds
  //   to pair of values from vec1 and vec2 at offset 1:VECLEN
  for (c = 0; c < VECLEN; c++)
    counts[idx1[vec1[c]]][idx2[vec2[c]]]++;
}

    Randy

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu

.