Path: ns-mx!uunet!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!astemgw!icspub!rdmei!ptimtc!nntp-server.caltech.edu!toddpw
From: toddpw@mripc.cco.caltech.edu (Todd P. Whitesel)
Newsgroups: comp.sys.apple2
Subject: better multiply routine than Orca/C's
Message-ID: <1991Aug22.003307.19379@cco.caltech.edu>
Date: 22 Aug 91 00:33:07 GMT
Sender: toddpw@cco.caltech.edu (Todd P. Whitesel)
Organization: California Institute of Technology, Pasadena
Lines: 44

dlyons@Apple.COM (David A Lyons) writes:

>For the record, I should point out that there are no such beasts as
>ASL nn,S or LSR nn,S on the 65816 (although there is ADC nn,S).

Aigh! Somebody kick me. This is what it was supposed to be:

int i, m, n;	...	{ i = m*n; }

	lda	m
	pha
	lda	n
	pha
	tsc
	phd
	tcd
	lda	#0
	bra	_a
_lp	asl	3
_a	lsr	1
	bcc	_b
	clc
	adc	3
_b	bne	_lp
	pld
	ply
	ply
	sta	i

>But your point is valid, I'm sure code similar to your post is more
>efficient that what ORCA/C is generating.

The above is a perfectly functional 16x16->16 bit unsigned multiply that uses
only the stack for scratch space (note that a tiny frame is created and then
destroyed). I originally wrote it as a mult16 function because I needed more
speed than Orca's standard multiply for expressions. (If anyone's wondering,
the LHG color cruncher does a lot of multiplies to properly weight colors
that are popular in the palette.)

It's almost tight enough to put inline. BTW Randy, how much slower is it
because the DP isn't aligned? 3 cycles out of 28? Hmm.

Todd Whitesel
toddpw @ tybalt.caltech.edu