Newsgroups: comp.sys.amiga.tech
Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb
From: ccplumb@rose.uwaterloo.ca (Colin Plumb)
Subject: Re: How do I blit this?
Message-ID: <1991Jan11.053627.29011@watdragon.waterloo.edu>
Sender: daemon@watdragon.waterloo.edu (Owner of Many System Processes)
Organization: University of Waterloo
References: <9101061957.AA20737@en.ecn.purdue.edu>
Date: Fri, 11 Jan 91 05:36:27 GMT
Lines: 102

In article <9101061957.AA20737@en.ecn.purdue.edu> bevis@ee.ecn.purdue.edu (Jeff Bevis) writes:
>Maybe it's just me, but I'm having a heck of a time trying to find a way to
>perfom a specific blit operation.  I'm accessing the hardware directly
>for my purpose.  The problem goes something like this:
>
>I've got a 320x200 bitplane, and I want to blit out a 16x16 bit rectangle.
>I want it to go into a buffer area the size of 16 words.  Sounds fine.  It's
>easy if the source rectangle is on a word boundary, too.  But supposing the
>source rectangle is shifted off the word boundary, I'll have to use the bit
>shifter in the blitter.  No problem there.  Let me draw a picture of the
>hypothetical source:
>
>0123456789abcdef 0123456789abcdef
>.....xxxxxxxxxx xxxxxx..........
>
>Here, the 16-bit wide block is shifted right by 6 bits.  So, I see that if
>blit two words per line, with a shift of 10, the blitter will align the source
>information in the second word it reads on each line.  The two words the
>blitter copies into the destination (for each line) will look like this:
>
>0123456789abcdef 0123456789abcdef
>cccccccccc...... xxxxxxxxxxxxxxxx
>
>(the c's are carried from the previous line;  they could be masked out)
>
>The question is, how do I get the blitter to not WRITE the first word on
>each line?  For a two-word wide blit, I'll need a two-word wide place to put
>the result.  I only want to use my 1-word by 16-line storage for the result.
>The first word is really junk that I don't need or want.  According to what
>I see, I need to reserve space for the unused portion of the blit.

Well, I can get the blitter to read and write back the first word unchanged.
This isn't quite as good as not touching it (there is a noticeable delay,
so there is a small chance that, unless it's protected during the blit,
the the processor will change it, only to have its original value stomped
back in -> BUG), but it's close.

What we need to use it the ubitquitous cookie cutter function, D = A*B + !A*B.
C is the destination, B is the source, and A is the mask.  Since the mask
is just a fixed-length row of bits, we don't need to load it from memory
and can just write $FFFF to BLTADAT (which will never be overwritten if we
don't enable DMA).  Program a shift of 0, a $0000 BLTAFWM and a $FFFF BLTALWM,
and then load $FFFF into BLTADAT (the internal latch is *after* the shifter,
so programming the shifter after writing doesn't do anything)

For source B, use the apropriate shift (10 bits, in your example) and modulo
(320 bits = 20 bytes, less 2 is 18 bytes).

Source C and destination D should point to the word before the 16-word
buffer, with a modulo of -2 (-1 word).

Then program the minterm, the DMA control to USEB, USEC and USED, and start
a blit two words wide and 16 deep.  The first word of the first line will
be fetched, shifted down so there are no significant bits, and the bits
which are one in ($FFFF (BLTADAT) & $0000 (BLTAFWM)), i.e. none of them,
in source C will be replaced with corresponding bits in source B, and
the result (the word originally read from C) will be written back to D.

Ka-boom, I just realized what's wrong.  The blitter has an internal pipeline,
so the results of one operation aren't written until the data for the
next one is read.  While computing the data for one line (the second word
accessed), it will fetch the word about to be written and not change it
when it's written back.  The upshot is that only the last word will be changed.

You have to turn this whole thing around, going to descending mode, to
avoid the problem.  The same basic approach works, though.


HOWEVER, I'd like to suggest that doing it with the processor (load 32
bits, shift, and store 16) would be faster than setting up the blitter
and waiting for it to finish.  In C,

void blit16(char *srcmap, USHORT *dstptr, USHORT x, USHORT y)
{
	register ULONG *baseaddr;
	register short shiftamt;
#define ROWSIZE 20 /* RASSIZE would be better if I remembered argument order */

	baseaddr = (ULONG *)(srcmap + ROWSIZE*y + ((x-1)/8)&-2);
	shiftamt = x&15;
	*destptr++ = (USHORT)(baseaddr[ 0*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 1*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 2*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 3*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 4*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 5*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 6*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 7*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 8*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[ 9*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[10*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[11*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[12*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[13*ROWSIZE]>>shiftamt]);
	*destptr++ = (USHORT)(baseaddr[14*ROWSIZE]>>shiftamt]);
	*destptr   = (USHORT)(baseaddr[15*ROWSIZE]>>shiftamt]);
}

It was written this way to make the assembler code pretty obvious
so even a stupid compiler should manage to make it optimal.
-- 
	-Colin
