Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb
From: ccplumb@rose.uwaterloo.ca (Colin Plumb)
Subject: Re: Mike Farren tutorial
Message-ID: <1991Apr7.000920.25630@watdragon.waterloo.edu>
Sender: news@watdragon.waterloo.edu (News Owner)
Organization: University of Waterloo
References: <dillon.5839@overload.Berkeley.CA.US>
Date: Sun, 7 Apr 1991 00:09:20 GMT
Lines: 50

dillon@overload.Berkeley.CA.US (Matthew Dillon) wrote:
>    The 1.3 OS was compiled with greenhills, I believe, which is a pretty
>    good compiler.  Would you rather the OS not have come out at all?  Do
>    you know how many YEARS it would take to write all that stuff in
>    hand assembly?  Much less debug it and enhance it.

Not to detract from your point, but I stepped through some of the
graphics.library code the other day and the quality makes me nauseous.
One particularly memorable part in ScrollVPort:
	moveq	#0,d0
	move.w	offset(a2),d0	; ViewPort->Modes, I recall...
	move.l	d0,d1
	moveq	#0,d0
	move.w	offset2(a2),d0
	(play with d1 a bit)
	move.w	d1,offset3(a3)

Now, if the cast to 32 bits is in the source code, I admit it's a bit hard
for a compiler to look and see that it's not necessary.  But if it's not,
it's pretty inexcusable to extend everything to 32 bits.  And extending in
d0 and then moving to d1 so you can clobber d0 again... well, I'm disgusted.

>    28% is nothing compared to finding an algorithm that gives you an order
>    of magnitude better performance, and the chance of finding such an
>    algorithm is incredibly high when you have the time to think about it.

A friend has a game in development.  It's written in assembler.  One
part is too slow - it can't scroll smoothly on a vanilla 1000.  So I'm
rewriting the main loop for speed, in C.

I'm just being careful not to do unnecessary work.  More efficient data
structures for keeping track of which enemies are off-screen, postponing
or eliminating status display updates, that sort of thing.

Assembler has its uses - 60% of the frame time is spent in two inner
loops that I first came up with fiendishly clever algorithms for,
debugged in C, then converted to assembler and spent weeks removing
every last ounce of fat from.  I'm very confident that, other than
unrolling them even more, there are no spare cycles anywhere in the
implementation, but more importantly, while one loop is fairly
straightforward (it's only two-dimensional forward differencing and a
bunch of rendering bit-bashing), the other involves rather non-obvious
projective geometry and I'm absolutely positive that the entire cracker
population of West Germany couldn't do the same thing at half the speed
before the year 2000.

Small-scale optimisations like assembler hacks are useful, but
algorithms are where you get order-of-magnitude speedups.
-- 
	-Colin
