Floating Point special instructions: FMUL4X4: OPCODE: db,f1 IIT ONLY This instruction is available only on the IIT (Integrated Information Technology Inc.) math processors. Takes 242 clocks. The instruction performs a 4x4 matrix multiply in one instruction using four banks of 8 floating point registers. The operands must be loaded to a specific bank in a specific order. The equation solved can be represented by: Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo) Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo) Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo) Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo) Where Xo stands for the original X value and Xn for the result. Operands must be loaded to the following registers in the specified banks in the specified order. Before FMUL4X4 After FMUL4X4 bank bank Register: 0 1 2 0 ST(0) Xo A33 A31 Xn ST(1) Yo A23 A21 Yn ST(2) Zo A13 A11 Zn ST(3) Vo A03 A01 Vn ST(4) A32 A30 ? ST(5) A22 A20 ? ST(6) A12 A10 ? ST(7) A02 A00 ? All four banks can be selected by using the bankswitching instructions, but only bank 0, 1 and 2 make sense since bank 3 is an internal scratchpad. The separate banks can contain 8 floating points and may be re-used with normal instructions. Each bank acts like an independent i80287, except when bankswitched inbetween, in those cases where the initial status is not maintained; Pseudo- multichip operation can be performed in each bank and even in multiple banks at the same time (although only one instruction will operate on one register at any given time), provided that the active register and top register are not changed after switching from bank to bank. EXAMPLE: FINIT ; reset control word FSBP1 ; select bank 1 FLD DWORD PTR es:[si] ; first original FLD DWORD PTR es:[si+4] ; second original FLD DWORD PTR es:[si+8] ; third original FSTCW WORD PTR [bx] ; save FPU control status FSBP2 ; NOTE ! you will see three active registers in this bank when using a debugger FINIT ; nothing visible FLD DWORD PTR [si] ; new value FLD DWORD PTR [si+4] ; second new value FADD ST,ST(1) ; two values visible FSTP DWORD PTR [si+8] ; one value visible FSBP1 ; one original visible FLDCW WORD PTR [bx] ; restore FPU status to the one active in bank 1, causing original three values to be visible again in correct sequence ... simply continue with what you wanted to do with those numbers from es:[si], they are still there. FLD DWORD PTR [si+8] ; for instance... This feature of the IIT chips can be used to perform complex operations in registers with many components remaining the same for a large dataset, only saving intermediary results to ONE memory location, bankswitching to the next series of operands, loading that ONE operand and continuing the calculation with the next set of operands already in that bank. This does require another read into the new bank but may save time and memoryspace compared to memory based operands or multiple pass algorithms with multiple arrays of intermediary results. .