From nobody@FreeBSD.ORG  Mon May 15 02:39:59 2000
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 8AE2137B52F; Mon, 15 May 2000 02:39:59 -0700 (PDT)
Message-Id: <20000515093959.8AE2137B52F@hub.freebsd.org>
Date: Mon, 15 May 2000 02:39:59 -0700 (PDT)
From: neis@cdc.informatik.tu-darmstadt.de
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@FreeBSD.org
Subject: libm's log1p not working as designed on Intel architectures.
X-Send-Pr-Version: www-1.0

>Number:         18560
>Category:       i386
>Synopsis:       libm's log1p not working as designed on Intel architectures.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon May 15 02:40:03 PDT 2000
>Closed-Date:    Wed Dec 19 17:40:37 PST 2001
>Last-Modified:  Wed Dec 19 17:46:45 PST 2001
>Originator:     Stefan Neis
>Release:        Actually none.
>Organization:
>Environment:
>Description:
While "porting" current libm to OS/2 (i.e. recompiling and running
various tests), I noticed that assembler code for log1p(x) is basically
as follows: Compute x+1, than call fyl2x.

This plainly contradicts the man page:
> log1p(x) returns a value equivalent to `log (1 +  x)'.  It
> is computed in a way that is accurate even if the value of
> x is near zero.

IMHO, you really need to use fyl2xp1, if x is sufficiently close to
0 for that instruction to work (unfortunately is working only in a
rather small interval).

On a related issue, the various wrapper functions around assembler
code cause an additional function call which really causes a
performance loss.
I have been able to speed up e.g. "acos" by more than then percent
by replacing the assembler routine __ieee754_acos with inline
assembler code.
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: neis@cdc.informatik.tu-darmstadt.de
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/18560: libm's log1p not working as designed on Intel
 architectures.
Date: Fri, 19 May 2000 19:43:30 +1000 (EST)

 On Mon, 15 May 2000 neis@cdc.informatik.tu-darmstadt.de wrote:
 
 > While "porting" current libm to OS/2 (i.e. recompiling and running
 > various tests), I noticed that assembler code for log1p(x) is basically
 > as follows: Compute x+1, than call fyl2x.
 
 This is only an efficient bug under FreeBSD.  log1p.S is too broken to
 use, so FreeBSD doesn't use it:
 
 RCS file: /home/ncvs/src/lib/msun/Makefile,v
 Working file: Makefile
 head: 1.23
 ...
 ----------------------------
 revision 1.14
 date: 1997/02/15 05:21:16;  author: bde;  state: Exp;  lines: +4 -2
 Disabled the i387 version if log1p().  It just evaluates log(1 + x).
 This defeats the point of log1p().  ucbtest reports errors of +-5e+15
 ULPs.  A correct version would use the i387 fyl2xp1 instruction for
 small x and maybe scale to small x.  The C version does the scaling
 reasonably efficiently, and fyl2px1 is slow (at least on P5s), so not
 much is lost by always using the C version (only 25% for small x even
 with the broken i387 version; 50% for large x).
 ----------------------------
 
 You can find a correct version in glibc (version 2.1.1. at least).
 
 > On a related issue, the various wrapper functions around assembler
 > code cause an additional function call which really causes a
 > performance loss.
 > I have been able to speed up e.g. "acos" by more than then percent
 > by replacing the assembler routine __ieee754_acos with inline
 > assembler code.
 
 A non-inline version of (the i387 version of) __ieee754_acos() is only
 about 2% slower than the inline version.  (Inlining acos doesn't help
 much because the inlined code is quite large and slow; the speedup for
 sqrt() is relatively much larger.)  I've never worried much about even
 10% speedups for inlining.  Usuually you only get the 10% speedups for
 simplistic benchmarks where everything is cached.
 
 Bruce
 
 

From: Stefan Neis <neis@cdc.informatik.tu-darmstadt.de>
To: Bruce Evans <bde@zeta.org.au>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/18560: libm's log1p not working as designed on Intel
 architectures.
Date: Sat, 20 May 2000 18:27:53 +0200 (MET DST)

 	Hi,
 
 > This is only an efficient bug under FreeBSD.  log1p.S is too broken to
 > use, so FreeBSD doesn't use it:
 > 
 > You can find a correct version in glibc (version 2.1.1. at least).
 
 I see.
 
 > (Inlining acos doesn't help
 > much because the inlined code is quite large and slow;
 
 Interesting. I was under the impression that the inlined code is rather
 small. :-? 
 
         double acos_inline(double x)
 {
         register double z;
         asm("fmul":  "=t" (z) : "0" (1+x), "u" (1-x) );
         asm("fsqrt": "=t" (z) : "0" (z) );
         asm("fpatan": "=t" (z) : "0" (1.0), "u" (z) );
         if(_LIB_VERSION == _IEEE_ || isnan(x)) return z;
         if(fabs(x)>1.0) {
                 return __kernel_standard(x,x,1); /* acos(|x|>1) */
         } else
             return z;
 }
 
 Anyway, thanks for taking the time to answer a not strictly BSD related
 question.
 
 	Regards,
 		Stefan
 
 
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Sun Dec 2 13:09:31 PST 2001 
State-Changed-Why:  

What was the conclusion here - is the current behaviour a bug or 
not? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=18560 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Wed Dec 19 17:40:37 PST 2001 
State-Changed-Why:  

My interpretation of Bruce's reply below is that this should be 
closed :-) 

In message <20011204122432.S4382-100000@gamplex.bde.org>, Bruce Evans writes: 
>Unformatted:
 >The low quality version assembler is not used, so the current 
 >behaviour is at most an efficiency bug. 
 > 
 >Bruce 
 
 
 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=18560 
