Subj : Re: Locating Nearest Neighbors in space (fast) To : comp.programming From : James Dow Allen Date : Sat Sep 10 2005 02:05 am > > My problem is this, I want to take each point in array 1, > > and find its 'nearest spatial neighbor' in array 2. I'll offer some miscellaneous comments, and then show an algorithm for a related problem in pseudocode. Although OP asks about points in a 3-D space, it seems nice to find an algorithm that scales nicely to k-D for large k. It certainly is less interesting to focus on the 2-D case: After all, if we're allowed to pick a small dimensionality to simplify things, we'd pick 1-D which we all know how to do! :-) I wonder what OP's application is. More common, I think, is the asymmetric case where "array 1" (aka test set) is much larger than "array 2" (aka training set). In that case one prepares static tables related to training set offline (ie, without regard to computation cost) and then handles test elements one-by-one in real-time. The "training set" may also be called "palette." (OP mentions 180 iterations. If array 2 is fixed, while array 1 varies with the iterations, then he has the "usual" case.) I certainly know little about this problem, but several years ago I stumbled upon and implemented "algorithm #1" from Huang et al, IEEE Trans Imag Proc July 1992. I think it's intended for training sets smaller than OP's but it's an interesting algorithm, and it will be interesting to hear comparisons with other approaches. * * * * * * * * * * * * * * * * If coordinates are small integers, you might use table lookups rather than multiplications to compute squares. In either case distance calculation won't take too long in 3-D space, so you might not bother with the special short-circuit done at the beginning of Step 5 below unless dimensionality is larger than 3-D. > If you use manhattan distance (x2 - x1 + y2 - y1 + z2 - z1) instead of I hate to be a smart-aleck, but your formula gives brooklyn distance, where taxis run in reverse half the time, with their meters recording negative money. > dist = sqrt( ... ) Uhh... > (By the way, I would never dream of using sqrt() to compare distances ... Yes, *many* people seem to overlook that minimizing distance-squared gives the same result and avoids square-root. I guess squared-distance may cause trouble when using a heuristic based on Triangle Inequality. (Huang's algorithm has such a heuristic, but only in a O(N) subprocess.) * * * * * * * * * * Here's "Algorithm #1" from Huang-1992. I've not seen that paper for 12 years, so I'm actually "reverse-engineering" my own source code. Buyer beware: I may have introduced errors or changes 12 years ago, or now, while rewriting in pseudo-code. For simplicity the following speaks strictly of *distance*. It is straightforward to modify the pseudocode to use *squared distance*, but there is a heuristic step where ordinary (square-root) distance seems to be needed. (Step 1 -- One-time Initialization) Set C to be a constant arbitrary point in space. (My code takes the center; for a given training set a different C may give better performance.) Take the distance from C to each y in the palette, sort these distances and place in the array A. (It may be convenient to keep *these* distances in another array as well, sorted by palette index and using ordinary (square-root) distance form, but this is just a one-time O(N) expense. That array is used in Step 6 below, just to avoid a distance calculation.) (The following substep generates N separate N-sized arrays. OP has N = 50000, so this algorithm might need major modification for OP to use it.) For each y in the palette, take the distance to each z (!= y) in the palette, sort these distances and place in the array B(y). (Step 2) Input the test point x. Take the distance to C, select the palette point y whose distance to C is closest to x's distance to C. (Use binary search on the sorted array A.) You will also want the ordinary (square-root) form of this distance dxC = |x - C|. The purpose of this step is to find an initial y close to x; the subsequent steps will iterate and find the closest y. Obviously there are other (better?) ways to select an initial close point. (Step 3) Calculate the distance from x to y; call this d. (Step 4) Set z to the first point in the sorted array B(y). This is the palette point closest to y (excluding y itself). (Step 5) Read the distance from y to z out of the array B(y). If this distance exceeds 2d (or squared-distance exceeds 4d^2), then y is the closest palette point to x, and we Terminate the algorithm (or rather go on to the next test point). (Step 6) Read the distance dzC = |z - C| from the array mentioned above. If |dzC - dxC| > d then the following calculation is unnecessary: just set z to the next point from B(y) and go back to Step 5. Calculate the distance from x to z. If this exceeds d, set z to the next point from B(y) and go back to Step 5. (Step 7) Since z is a better candidate than y, set y to z, set d to the (x,z) distance just calculated, and go to Step 4. - - - - - - - - - - - James Dow Allen .