Path: ns-mx!uunet!cs.utexas.edu!sun-barr!ccut!wnoc-tyo-news!astemgw!icspub!rdmei!ptimtc!nntp-server.caltech.edu!toddpw From: toddpw@mripc.cco.caltech.edu (Todd P. Whitesel) Newsgroups: comp.sys.apple2 Subject: better multiply routine than Orca/C's Message-ID: <1991Aug22.003307.19379@cco.caltech.edu> Date: 22 Aug 91 00:33:07 GMT Sender: toddpw@cco.caltech.edu (Todd P. Whitesel) Organization: California Institute of Technology, Pasadena Lines: 44 dlyons@Apple.COM (David A Lyons) writes: >For the record, I should point out that there are no such beasts as >ASL nn,S or LSR nn,S on the 65816 (although there is ADC nn,S). Aigh! Somebody kick me. This is what it was supposed to be: int i, m, n; ... { i = m*n; } lda m pha lda n pha tsc phd tcd lda #0 bra _a _lp asl 3 _a lsr 1 bcc _b clc adc 3 _b bne _lp pld ply ply sta i >But your point is valid, I'm sure code similar to your post is more >efficient that what ORCA/C is generating. The above is a perfectly functional 16x16->16 bit unsigned multiply that uses only the stack for scratch space (note that a tiny frame is created and then destroyed). I originally wrote it as a mult16 function because I needed more speed than Orca's standard multiply for expressions. (If anyone's wondering, the LHG color cruncher does a lot of multiplies to properly weight colors that are popular in the palette.) It's almost tight enough to put inline. BTW Randy, how much slower is it because the DP isn't aligned? 3 cycles out of 28? Hmm. Todd Whitesel toddpw @ tybalt.caltech.edu