(c)  Copyright 1989 Commodore-Amiga, Inc.   All rights reserved.
The information contained herein is subject to change without notice, and 
is provided "as is" without warranty of any kind, either expressed or implied.  
The entire risk as to the use of this information is assumed by the user.




                    Introduction to 1.3 IEEE 
                   Double Precision Libraries

                         by Dale Luck




 The basic double precision IEEE library has been rewritten for V1.3.
 The new library is up to 4 times faster than the old one that came with 
 V1.2.  There were also several bugs fixed.  And the routines now produce
 slightly more accurate results.  I've listed some benchmarks comparing
 the two versions of the libraries at the end of this article.

 Besides the faster software emulation of floating point, the new IEEE math
 library recognizes and uses the 68020/68881 processor combination and
 will use  the special floating point instructions available.  Also, if
 an auto-configured math resource is available, it will use that as well.
 Typically, this resource would point to the base of a 68881 designed as
 a 16 bit IO port.  But it could be another device as well. 

 With the new library, you also have the ability to programmatically trap
 math errors such as overflow and divide by zero.  Your program can now
 ignore them or take suitable action without visiting the GURU.

 In addition to a new version of the basic mathieeedoubbas.library, a
 second library supporting transcendental functions has been added. The 
 name of the new library is mathieedoubtrans.library for IEEE double 
 precision transcendental library.  It supports the same functions as the
 transcendantal library for the Motorola fast floating point, such as sine,
 cosine, square root, etc.  This library also can identify and use the 
 68020/68881 combination or other math resources. And it has a very fast 
 software square root routine.


When Should You Use These Libraries?

 These libraries have been benchmarked as the fastest IEEE double precision
 libraries available on the Amiga as well as outperforming almost all other
 software math libraries in the Amiga class personal workstation market.

 If you need the precision of IEEE double, and wish to have a transparent
 improvement in speed when your programs run on machines with math
 coprocessors, then you should use these libraries.  All the decision
 making is done by the library when it is first initialized and it will
 use the fastest available resources to do your math.  You only need
 one program to support a standard Amiga, a 68020/68881 Amiga, or a
 external math coprocessor Amiga.  It works automatically.





When Should You Avoid These Libraries?

 If you don't need double precision, use the Motorola fast floating point
 routines.  As you can see from the benchmarks, the Motorola routines are
 still quite a bit faster.

 If you want your math to be the fastest possible, you will want to use 
 the new instructions available on the 68020/68881 directly in your code.
 In that case, you would not need the IEEE libraries.  However this would
 prevent your code from running on conventional 68000 based Amigas unless
 you supply different versions of your code for each configuration.



Floating Point Formats

 Here's a chart comparing the various methods of representing floating 
 point numbers used by Amiga system software.  The IEEE double precision
 libraries operate on 64 bit quantities.  The Motorola FFP libraries use
 32 bits.

 Note that there is a "hidden" bit in the fraction part of IEEE numbers. 
 Since all numbers are normalized, the leading 1 is dropped off.

 

         			Motorola	Single		Double
    Field Size (bits)    	FFP		IEEE		IEEE

    Sign			1		1		1
    Exponent    		7		7		11
    Fraction	        	24		23+1		52+1

    Total			32		32		64




    Minimum (+) number    	5.4e-20		1.3e-38		2.2e-308

    Largest (+) number    	9.2e+19		3.4e+38		1.8e+307

    Minimum (+) number           n/a		1.4e-45		4.9e-324
    (denormalized)

    Denormalized means reduced in precision so that numbers closer 
    to zero can be represented.









 Floating Point Representation


    +--------+--------+--------+--------+
    |ffffffff|ffffffff|ffffffff|Seeeeeee|	Motorola FFP
    +--------+--------+--------+--------+  
  
    +--------+--------+--------+--------+
    |Seeeeeee|ffffffff|ffffffff|ffffffff|	IEEE Single
    +--------+--------+--------+--------+


						IEEE Double

    +--------+--------+--------+--------+--------+--------+--------+--------+
    |Seeeeeee|eeeeffff|ffffffff|ffffffff|ffffffff|ffffffff|ffffffff|ffffffff|
    +--------+--------+--------+--------+--------+--------+--------+--------+   

    S = Sign bit
    f = fraction bits
    e = exponent bits


 The scheme used in IEEE floating point representation includes a few
 "special" numbers.  Certain patterns of bits are used to represent
 exceptions: 

	o  NAN  	'Not A Number'	(result of 0/0)

	o  INF	        'Infinity'	(result of 1/0)

 There are other assigned patterns in addition to these two.





Using the Libraries

 The new IEEE libraries should be placed in the :libs directory.  Use
 the mathieeedoubbas.library to replace the old library of that same
 name.  The mathieeedoubtrans.library is an all new addition.

 Code that calls routines in these libraries will have to be linked
 to the new .lib files which also have awkward names.  They are
 mathieeedoubbas_lib.lib and mathieeedoubtrans_lib.lib.  And there
 is a new .fd file for the transcendental functions.

 Using the IEEE routines is straight forward - they are a standard
 library.  Simply open the library, use the routines and close the
 library when you are done.  For example, to use the Sine routine:





        /* IEEE Sine Routine                                    */
        /* Compile under Lattice 4.0  by linking with   c.o +   */
        /* mathieeedoubbas_lib.lib + mathieeedoubtrans_lib.lib  */
        /* + lcm.lib + lc.lib + amiga.lib                       */

	       double IEEEDPSin();
	extern int    MathIeeeDoubBasBase;
               int    MathIeeeDoubTransBase;

	void
	main()
	{

	double x=0;

	MathIeeeDoubBasBase=OpenLibrary("mathieeedoubbas.library",0);
	if(MathIeeeDoubBasBase==0)  exit(0);

	MathIeeeDoubTransBase=OpenLibrary("mathieeedoubtrans.library",0);
	if(MathIeeeDoubTransBase==0)
	  {
	  CloseLibrary(MathIeeeDoubBasBase);
	  exit(0);
	  }

	x=IEEEDPSin( (double) 60 );
	printf("sin 60 = %e\n",x);	

	CloseLibrary(MathIeeeDoubBasBase);
	CloseLibrary(MathIeeeDoubTransBase);

	}
 


Hardware Developer Information.

 To make use of CBM's standard peripheral support for 68881 you must design 
 your peripheral to autoconfig.  Your autoconfig software must create a 
 resource and add it to the resource list.  The name of this resource is 
 "MathIEEE.resource".  The IEEE library will attempt to open this resource. 
 If it finds it, it will extract the BaseAddr pointer and copy it into its 
 library structure.  If the BaseAddr pointer is non-null it will use a 
 different list of routine entry points when the IEEE library is initialized.

 After the IEEE library is initialized, the library again checks the resource 
 for alternate function bits in Flags of the resource. The Basic library only 
 checks the DblBasAlt bit, and the transcendental library only checks the
 DblTransAlt bit.  If they are set, the library routine will call the function
 whose address is in the corresponding Init field.  The arguments passed are 
 a6=sysbase, a1=resource and a2=mathlibrary.

 If your device is not a 68881 then you may need to use this.  There are 
 separate bits for different library capabilities in case your math resource 
 is only able to handle a limited set of functions.  This will let you tie a 
 math processor in that may only provide addition, subtraction, multiplication and
 and  division functions.  The rest of software will use it transparently by 
 calling your alternate routines.

 Amiga does not provide for arbitrating a math accelerator in a multitasking 
 environment.  Therefore, you must provide your own support for this when your 
 device autoconfigs.  The only exception is the 68020/68881 combination where 
 support for that has been standard since V1.2.  Arbitration usually involves 
 saving and restoring the state of you hardware device between task switches.

 We recommend that you look at the tc_Switch and tc_Launch vectors in the task
 data structure. These are called each time control transfers from one task to
 another.  Remember not to assume that you are the only process needing to use
 those vectors.

 The resource data structure is as follows:

 STRUCTURE  MathIEEE,LN_SIZE
        UWORD   MathIEEE_Flags
        ULONG   MathIEEE_BaseAddr       ; for standard 68881 support
        ULONG   MathIEEE_DblBasInit     ; something else besides 68881
        ULONG   MathIEEE_DblTransInit   ; something else besides 68881
        ULONG   MathIEEE_SnglBasInit    ; something else besides 68881
        ULONG   MathIEEE_SnglTransInit  ; something else besides 68881
 LABEL  MathIEEE_sizeof
*
*       Bits for MathIEEE_flags.  All unassigned bits must be 0
*
        BITDEF  MathIEEE,DblBasAlt,0            ; alternate Basic library
        BITDEF  MathIEEE,DblTransAlt,1          ; alternate Trans library
        BITDEF  MathIEEE,SnglBasAlt,2           ; alternate Basic library
        BITDEF  MathIEEE,SnglTransAlt,3         ; alternate Trans library


 The MathIEEE resource structure may grow in the future.  Extensions will be 
 added as Commodore-Amiga adds new standards such as 80 bit extended format.

 The 'Init' entries in the math resource structure are only used if the 
 corresponding Bit is set in the Flags field.  So if you are just a 68881, 
 you do not need the Init entries.  Make sure you have cleared the Flags field.
 This should allow us to add Extended Precision later.  For Init users, make 
 sure you add yourself into the Open/Close/Expunge vectors for this library.




 The library structure that is used is tentatively laid out as shown below.
 I say tentatively because the name of the entries may change yet.  The order 
 of entries, their usage and size will not change.  Naturally we may add new
 fields to the end.



    STRUCTURE  MI,LIB_SIZE      ; Standard library node
        UBYTE   io8_Flags       ; is this 68881?
        UBYTE   io8_pad         ; line up to next 32bit boundary
        ULONG   io8_68881       ; ptr to io68881 base
        ULONG   io8_SysLib      ; ptr to SysBase
        ULONG   io8_SegList     ; ptr to this SegList
        ULONG   io8_Resource    ; ptr to mathIEEE.resource
        ULONG   io8_opentask    ; called when task opens
        ULONG   io8_closetask   ; called when task closes
    LABEL   MI_SIZE


 Of particular interest to hardware developers are the opentask and closetask 
 entry points.  These functions will be called when a task calls OpenLibrary 
 and CloseLibrary.  This will give the vendor the opportunity to set up any 
 per task initialization necessary.  The Amiga library presently sets them up 
 as NOPs in the case of straight emulation.  It puts the 68881 initialization 
 code in there for the 68020/68881 as well as the peripheral 68881.  That 
 initialization code currently sets up rounding modes and interrupt requests.

 If you need to override the defaults, you will have to set the appropriate
 Alt bits in the Resource structure and overwrite the opentask/closetask
 fields when your AltInit function is called.  The OpenLibrary routine checks 
 the return value of opentask for errors.  If a nonzero is in d0.l then 
 OpenLibrary will return 0 to the task trying to OpenLibrary.


 On the 68020/68881 some new exceptions are generated.  Unfortunately the 
 V1.2 operating system does not properly initialize these. For users of the 
 new ramkick/A2024 system, the fixes have been added to the exec.library.  
 For the rest we provide a program to run during your startup sequence to 
 initialize the vectors and redirect processing back to exec when the new 
 exceptions occur.  This is only necessary on 68020/68881 systems.



Benchmarks

 This section contains some benchmarks comparing the performance of the
 various Amiga math libraries.  Use these as a guide when selecting the
 math routines to be used for your application.

 All these benchmarks show the reults when compiling under Greenhill's C.
 The results you get with another compiler will vary.













 How does V1.3 stack up to V1.2?
   A Comparison of Software

			V1.2	  V1.3	   V1.2
                        IEEE      IEEE    MathFFP
   Float		
      10000	(secs)	92.14 	  45.22	  17.64
     256000	(secs) 580.58    282.52   136.78

   Calcpi
	(kflops/sec)	2.07	   4.93	  11.14
         PI error     -5.5e-14	-1.4e-11  6.1e-5

   Whetstone
   	(kwhets/sec)	12	  24	  78

   Savage
	(secs)		N/A	 470	  98.2

   System tested:  A1000, 512k chip memory, 1 external floppy



 Transparent Increase in Speed

			V1.3/000	000/881		020/881
   Float		
      10000    (secs)	 45.22		19.18		 13.46
     256000    (secs)	282.52 	       179.98		122.46

   BCalcpi
	(kflops/sec)	 4.93		7.89		11.78
	PI error	-1.39e-11	-2.78e-11	-2.78e-11

   Whetstone
   	(kwhets/sec)	24		81		124

   Savage
	(secs)	       470		20.4		15.2
	error		-6.9e-7		-5.6e-7		-5.6e-7



   Systems tested:

      V1.3/000 was an A1000 with 512k.
      000/881  was an A1000 with 512k plus 2M and Microbotic's "881 Starmath
      020/881  was an A2000 with CSA's 68020/68881, 2M memory and a 2090a

 Penultimate Speed Tests:
 Comparison of Speed Using 
  Inline F instructions

			V1.3/000	020/881
   Float		
      10000    (secs)	 45.22		 0.26*
     256000    (secs)	282.52 	 	15.86

   Calcpi
	(kflops/sec)	  4.93		81.3

   Whetstone
   	(kwhets/sec)	   24		459

   Savage
	(secs)		   470		  4.6

   Systems tested:

     V1.3/000 was an A1000 with 512k and 1 external floppy.
     020/881  was an A2000 with  CSA's 68020/881, 2M memory and a 2090a.
 
     Note:  Under this test, the 020/881 test code will not run on a 
     standard 68000 based system.

     * The Greenhill compiler may have optimized this benchmark to nothing.


 Penultimate Speed Tests, II:
   Inline Results With  
   Fast 32-Bit Memory
                                                           Inline       Inline
                        020        020/881    030/882     020/881      030/882
   Float		
      10000    (secs)	25.6	     6.08	 5.16	   0.24*	 0.18*
     256000    (secs)	168.74	    54.08	47.52	  15.28		13.16

   Calcpi
	(kflops/sec)	8.44	    25.29	28.8	  90.09		114.42

   Whetstone		
   	(kwhets/sec)	39	   263		291	 673		889

   Savage
	(secs)	       	320.8	     8.4	7.6	   4.46		3.98

   Systems tested:

     020     was an A2000 with CSA's 020 board running at 14 MHz.
     020/881 was an A2000 with CSA's 020/881 board running at 14 MHz.
     030/882 was an A2000 with CSA's 030/882 board running at 14/16 MHz.

     * The greenhills compiler may have optimized this benchmark to nothing.





