Newsgroups: comp.arch
Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!hybrid!scifi!bywater!uunet!kithrup!sef
From: sef@kithrup.COM (Sean Eric Fagan)
Subject: Re: new instructions
Organization: Kithrup Enterprises, Ltd.
Date: Thu, 23 May 1991 19:25:57 GMT
Message-ID: <1991May23.192557.7558@kithrup.COM>
References: <1991May22.001620.751@craycos.com> <1991May23.084258.5062@kithrup.COM> <24216@lanl.gov>

In article <24216@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>It would amaze me to find any machine (on which the test could be done) 
>where a table lookup came within an order of magnitude of a hardware
>instruction on these functions.

How about a Cyber?  A Cyber, without the pop-count hardware, takes
something like 60 cycles to do a popcount.  And the Cray has lousy
memory-access times, and isn't a byte-addressable machine.  How long would

	char *byte = (char *)&word;
	pop_count = table[byte[0]] + table[byte[1]] + table[byte[2]] +
		table[byte[3]];

take on a machine with somewhat better memory accesses?  Say, an R6000, or
even a Sparc?

And don't forget that, for serial code, the R6000 is faster than the Cray.
So that doesn't quite count as a "slow machine," does it?

So here is a way of doing pop-count, quite quickly (it's possible for a
compiler to put the byte[x] into registers and not have to access memory,
the first reference to table could put quite a bit of table into a cache,
and if you have pipelined loads, it *does* go *very* quickly), that doesn't
require any special instructions.  And will work the same way, if not
faster, on later versions of the processor.  This is not true with
instructions that don't get a lot of use:  witness the 68040 and
transcendental instructions.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.
