Subj : Re: Locating Nearest Neighbors in space (fast)
To   : comp.programming
From : James Dow Allen
Date : Sat Sep 10 2005 02:05 am

> > My problem is this, I want to take each point in array 1,
> > and find its 'nearest spatial neighbor' in array 2.

I'll offer some miscellaneous comments, and then
show an algorithm for a related problem in pseudocode.

Although OP asks about points in a 3-D space, it seems
nice to find an algorithm that scales nicely to k-D
for large k.  It certainly is less interesting to focus
on the 2-D case:  After all, if we're allowed to pick
a small dimensionality to simplify things, we'd pick 1-D
which we all know how to do!  :-)

I wonder what OP's application is.  More common, I think,
is the asymmetric case where "array 1" (aka test set) is
much larger than "array 2" (aka training set).  In that
case one prepares static tables related to training set
offline (ie, without regard to computation cost) and then
handles test elements one-by-one in real-time.
The "training set" may also be called "palette."

(OP mentions 180 iterations.  If array 2 is fixed, while
array 1 varies with the iterations, then he has the "usual"
case.)

I certainly know little about this problem, but several
years ago I stumbled upon and implemented "algorithm #1"
from Huang et al, IEEE Trans Imag Proc July 1992.  I
think it's intended for training sets smaller than OP's
but it's an interesting algorithm, and it will be interesting
to hear comparisons with other approaches.

   *  *  *  *  *  *  *  *  *  *  *  *  *  *  *  *

If coordinates are small integers, you might use table lookups
rather than multiplications to compute squares.  In either
case distance calculation won't take too long in 3-D space,
so you might not bother with the special short-circuit done at
the beginning of Step 5 below unless dimensionality is larger
than 3-D.

> If you use manhattan distance (x2 - x1 + y2 - y1 + z2 - z1) instead of

I hate to be a smart-aleck, but your formula gives brooklyn distance,
where taxis run in reverse half the time, with their meters
recording negative money.

>       dist = sqrt( ... )

Uhh...

> (By the way, I would never dream of using sqrt() to compare distances ...

Yes, *many* people seem to overlook that minimizing distance-squared
gives the same result and avoids square-root.  I guess squared-distance
may cause trouble when using a heuristic based on Triangle Inequality.
(Huang's algorithm has such a heuristic, but only in a O(N)
subprocess.)

    *  *  *  *  *  *  *  *  *  *

Here's "Algorithm #1" from Huang-1992.  I've not seen that paper for
12 years, so I'm actually "reverse-engineering" my own source code.
Buyer beware: I may have introduced errors or changes 12 years ago,
or now, while rewriting in pseudo-code.

For simplicity the following speaks strictly of *distance*.
It is straightforward to modify the pseudocode to use
*squared distance*, but there is a heuristic step where
ordinary (square-root) distance seems to be needed.

(Step 1 -- One-time Initialization)

Set C to be a constant arbitrary point in space.  (My code
takes the center; for a given training set a different
C may give better performance.)

Take the distance from C to each y in the palette, sort
these distances and place in the array A.

(It may be convenient to keep *these* distances in
another array as well, sorted by palette index and
using ordinary (square-root) distance form, but this is
just a one-time O(N) expense.  That array is used
in Step 6 below, just to avoid a distance calculation.)

(The following substep generates N separate N-sized arrays.
OP has N = 50000, so this algorithm might need major
modification for OP to use it.)
For each y in the palette, take the distance to each z (!= y) in
the palette, sort these distances and place in the array B(y).

(Step 2)

Input the test point x.  Take the distance to C, select
the palette point y whose distance to C is closest to x's
distance to C.  (Use binary search on the sorted array A.)
You will also want the ordinary (square-root) form of
this distance dxC = |x - C|.

The purpose of this step is to find an initial y close
to x; the subsequent steps will iterate and find the closest
y.  Obviously there are other (better?) ways to select
an initial close point.

(Step 3)

Calculate the distance from x to y; call this d.

(Step 4)

Set z to the first point in the sorted array B(y).
This is the palette point closest to y (excluding y itself).

(Step 5)

Read the distance from y to z out of the array B(y).
If this distance exceeds 2d (or squared-distance
exceeds 4d^2), then y is the closest palette point
to x, and we Terminate the algorithm (or rather go
on to the next test point).

(Step 6)

Read the distance dzC = |z - C| from the array mentioned
above.  If |dzC - dxC| > d then the following calculation
is unnecessary: just set z to the next point from B(y)
and go back to Step 5.

Calculate the distance from x to z.
If this exceeds d, set z to the next point from B(y)
and go back to Step 5.

(Step 7)

Since z is a better candidate than y, set y to z,
set d to the (x,z) distance just calculated, and
go to Step 4.

 -  -  -  -  -  -  -  -  -  -  -

James Dow Allen

.