https://muscar.eu/shar-binary-search-meta.html

muscar.eu

Beautiful Binary Search in D

Alex Muscar

February 18, 2023

February 18, 2023: @schveiguy pointed out that the initial approach
to determining the static array size was overly complicated. I've
updated the code to incorporate their suggestion. Thanks!

Recently, while I was reading "Writing Efficient Programs", I came
across a beautiful variation of binary search and its implementation.
In this post I'll be using D's metaprogramming capabilities to
implement a generic version of the algorithm.

For the rest of the post I'm going to assume you're familiar with
binary search, and its most common implementation-the bisection
method using two pointers.

A beautiful algorithm

There are many ways to implement a binary search. The most common is
probably using a "bisection search", where we track the subintervals
using two pointers, one moving from left to right, the other from
right to left.

Another variant is the "uniform search", where instead of using two
pointers, we use only a single pointer and a "rate of change", i.e.,
the start and the size of the subintervals. This is more subtle than
bisection search-which is not trivial by the way. In Knuth's words:

    It is possible to do this, but only if extreme care is paid to
    the details, [...]. Simpler approaches are doomed to failure.

On the other hand, uniform search has some advantages. One of them is
that the the rates of change can be precalculated, and stored in a
side table. If we get the rate of change calculation right-which is
the subtle part-the search algorithm is simpler than its cousin using
two pointers.

A variation of uniform search is Shar's algorithm. It does away with
the side table, and uses power of two interval sizes.

It starts by comparing the the key we are looking for, K, with K[i],
where i = 2^k, k = [?]lgN[?]. If K < K[i], the interval sizes are 2^k - 1
, 2^k - 2, ..., 1, 0. But if K > K[i], we set i = N - 2^l + 1, where l 
= [?]lg(N-2^k+1)[?], adjust the interval pointer, and the interval sizes
are 2^l - 1, 2^l - 2, ..., 1, 0.

Shar's algorithm determines the position of the entry with key K bit
by bit. Each test adds one more bit. We need one more test, after
we've determined all the bits, to see if the entry is actually in the
table.

A beautiful implementation

Bentley provides a beautiful version of Shar's algorithm in his book.
The code works for tables with 1000 elements. The code in the book is
written in Pascal, but transliterated to D it looks like this:

 int bsearch1000(int[1001] xs, int x)
 {
     auto i = 0;
     if (xs[512] <= x) { i = 1000 - 512 + 1; }
     if (xs[i + 256] <= x) { i += 256; }
     if (xs[i + 128] <= x) { i += 128; }
     if (xs[i + 64] <= x) { i += 64; }
     if (xs[i + 32] <= x) { i += 32; }
     if (xs[i + 16] <= x) { i += 16; }
     if (xs[i + 8] <= x) { i += 8; }
     if (xs[i + 4] <= x) { i += 4; }
     if (xs[i + 2] <= x) { i += 2; }
     if (xs[i + 1] <= x) { i += 1; }
 
     if (xs[i] == x) return i;
     else return 0;
 }

There's a few things going on here. First, the odd array size, 1001.
Pascal arrays are one-based. D follows in C's footsteps with
zero-based arrays. We just ignore xs[0] in this case. This is a bug
by the way. Bentley acknowledges it, but he doesn't provide a fix
since he considers it would detract from the exposition. We can fix
it by setting i to 1 initially, and making the necessary code
adjustments. This would complicate the code somewhat. Another way to
fix it by explicitly checking if i is 0 in the last test.

Second, the code fully unrolls the search loop. This is only possible
because we know the size of the table beforehand. The code can be
adjusted for other table sizes as needed.

What makes this code beautiful is that it's about as efficient as it
could be. It's also uniform, and relatively easy to understand if you
know about Shar's algorithm. It's an almost word for word rendition
of the algorithm tailored for this particular table size.

Beautiful metaprogramming

Beautiful as it is, Bentley's code is somewhat tedious to adjust to
other table sizes, and it's easy to make mistakes while calculating
the initial value of i. The code is very repetitive and uniform. This
is a strong hint that we can automate writing it.

This is where D's powerful metaprogramming capabilities come in. If
we can determine the size of the table at compile time, we could in
principle generate the code for the unrolled loop automatically.

As it turns out, we can determine if we're dealing with a static
array, and get its size at compile time. Before I show you the code,
let's break the problem down. The algorithm has three parts:

 1. The initial test, comparing the search key K with K[i] for i = 2^
    k, k = [?]lgN[?];

 2. Determining successive bits of the candidate position by
    iterating over powers of two; and

 3. Checking if the we found the element we are looking for.

I'm going to show the function signature first, since it's a bit
complicated, and I'm going to explain every bit of it after.

 auto bsearch(T: U[n], U, int n, int k = iflog2(n - 1))(T xs, U x)
 {

While this looks a bit complicated, it's much cleaner than what we'd
get in C++. Let's take each template parameter in turn and see what
it means.

T is the table type. The constraint on T only matches static arrays.
This is were we tease out the element type, U, and crucially, the
size of the array, n.

So far we only have U and n in the scope of the constraint on T. We
pull them in the template parameter list so they're available in the
function parameter list and body.

Then we determine k, the power of two where we start the search.
iflog2 stands for integer floor logarithm in base 2. This is a
regular D function, but it can be evaluated at compile time when
called with a compile time value, which is what we do here.

Compared to the template parameters, the regular parameters are
almost boring: the table xs and the key x we are looking for.^1

The initial test is just the code rendition of the test in Shar's
algorithm:

     auto p = 0;
 
     if (xs[pow(2, k)] <= x) p = (n - 1) - pow(2, k) + 1;

We track the position of the candidate element in p.

Now for the fun bit, generating the power of two tests:

     static foreach_reverse (int i; 0 .. k)
     {
         if (xs[p + pow(2, i)] <= x) p += pow(2, i);
     }

This code is remarkably short thanks to the problem's regularity we
mentioned earlier, and to D's powerful metaprogramming capabilities.
A static foreach is evaluated at compile time. And crucially, it
doesn't introduce a new scope. The code inside is just "spliced" in
the code of the surrounding function. In effect, this snippet
generates a series of if statements equivalent to the one in
bsearch1000. We use foreach_reverse to iterate over the the exponents
k to 1-the range 0 .. k is open, and we're iterating over it in
reverse. The choice of foreach_reverse as a keyword is somewhat
unfortunate. There may be a cleaner way of achieving the same result,
but I don't use D regularly, so this is what I came up with :^).

Once we've determined all the bits of the candidate element position
p all that's left to do is to check if the element we're looking for
is at that position.

     if (p > 0 && xs[p] == x) return p;
     return 0;
 }

And with this we're done. If we check the code generated for
bsearch1000 and bsearch on the compiler explorer. we see that's it's
virtually the same up. The D compiler even inlined the compile time
constants for us.

Bonus: making it truly generic

One shortcoming of our solution is that it only works for static
arrays. It would be nice if we could detect that we're dealing with a
dynamic array, and fall back to another algorithm.

You probably won't be surprised to find out that we can-otherwise I
wouldn't be writing this section :^).

The whole implementation is a bit anticlimatic really. We can just
use function overloading, and declare a version of bsearch that takes
a dynamic array parameter:

 auto bsearch(T)(T[] xs, T x)
 {
     writeln("dynamic array: ", xs);
     // ...
     return 0;
 }

The compiler will choose the right overload based on the arguments we
call the function with.

Conclusion

Shar's algorithm is a beautiful variation of binary search. If we
know the table size in advance we can write an efficient binary
search algorithm using it.

Bentley's implementation is also beautiful because it squeezes every
bit of performance from the code. The code does no more than it
should, and there is a Zen kind of beauty to it.

D's metaprogramming capabilities allowed us to take Bentley's
beautiful code, and make it more general. We can generate the code
for any table size if we know it at compile time. Otherwise we fall
back to an algorithm that can deal with dynamic arrays. Andrei
Alexandrescu calls this technique "Design by Introspection".

D is a fun language to program in. It shines for this kind of
problems, where you can make heavy use of metaprogramming. I haven't
used any other language where this problem could be expressed as
cleanly, except maybe Common Lisp. I'd be curious to see a solution
in Rust (which has a strong macro system) and Zig (which I read has
strong support for metaprogramming as well).

---------------------------------------------------------------------

 1. In the example in this post the key and the element are the same,
    they are integers.-[?]