Newsgroups: comp.arch
Path: utzoo!henry
From: henry@utzoo.uucp (Henry Spencer)
Subject: Re: Understanding variations in Dhrystone performance
Message-ID: <1989May15.173631.3029@utzoo.uucp>
Organization: U of Toronto Zoology
References: <474@estevax.UUCP>
Date: Mon, 15 May 89 17:36:31 GMT

In article <474@estevax.UUCP> wck353@estevax.UUCP (HrDr Weicker Reinhold ) writes:
>... Note that
>processors with an instruction that checks a word for a null byte (such
>as AMD's 29000 and Intel's 80960) have an advantage here...

Only a small one; you can do the same check on a machine without the
fancy instruction by being clever.  Consider:

	(((x & ~0x80808080) - 0x01010101) & 0x80808080)

The result is nonzero if, and only if, there was a NUL byte in x.  This
is a bit more expensive than a single instruction, but not a whole lot
if you put the constants in registers... especially on a machine where
you can juggle the code to put most of the operations in load-delay slots.
If you're into benchmarksmanship seriously, you can omit the first "&"
if you're careful to use only ASCII (or if you expect high-bit characters
to be rare and are willing to do a more precise check afterward to eliminate
false alarms).  There are a number of variations.

>If the fixed-length and word-alignment assumption can be used, a wide
>bus that permits fast multi-word load instructions certainly does help;

Beware that there are alignment restrictions here too:  you don't want
a multi-word load to cross a page boundary unless you are sure the string
crosses it too.  Accessing the next page may cause a trap.
-- 
Subversion, n:  a superset     |     Henry Spencer at U of Toronto Zoology
of a subset.    --J.J. Horning | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
