From dtc@scrooge.ee.swin.oz.au  Sun Sep 21 07:52:30 1997
Received: from scrooge.ee.swin.oz.au (scrooge.ee.swin.oz.au [136.186.4.20])
          by hub.freebsd.org (8.8.7/8.8.7) with SMTP id HAA03141
          for <FreeBSD-gnats-submit@freebsd.org>; Sun, 21 Sep 1997 07:52:25 -0700 (PDT)
Received: (from dtc@localhost) by scrooge.ee.swin.oz.au (8.6.9/8.6.9) id AAA06008 for FreeBSD-gnats-submit@freebsd.org; Mon, 22 Sep 1997 00:55:38 +1000
Message-Id: <199709211455.AAA06008@scrooge.ee.swin.oz.au>
Date: Mon, 22 Sep 1997 00:55:38 +1000 (EST)
From: Douglas Thomas Crosher  <dtc@scrooge.ee.swin.oz.au>
To: FreeBSD-gnats-submit@freebsd.org
Subject: Patch to pass NPX status word in signal code.

>Number:         4597
>Category:       kern
>Synopsis:       Patch to pass NPX status word in signal code on SIGFPE.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sun Sep 21 08:00:01 PDT 1997
>Closed-Date:    Wed Oct 10 12:48:10 PDT 2001
>Last-Modified:  Wed Oct 10 12:50:21 PDT 2001
>Originator:     Douglas Crosher
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
Swinburne University
>Environment:

N/A

>Description:

Presently it's not possible for a process to determine the cause of a
FP exception it may generate. The exceptions in the NPX status word
are cleared after an exception so the process has no way of
determining which exception occurred. The suggested change would have
the NPX status word passed in the signal code allowing the process to
determine the cause of the exception.

Being able to determine the cause of a FP exception is critical for
some numerical code, E.g. it may be used to adjust an algorithm if an
overflow occurs. Further it may indicate a bug in an algorithm,
(e.g. an unexpected division by zero), and knowing the cause of an
exception can help locate the problem.

Algorithms may also be mixing the monitoring the accrued exceptions
with some traps enabled. The clearing of all the exceptions on a
SIGFPE clears the accrued exceptions and may break such code unless
the accrued exceptions are restored by the application on a longjump;
but if the application doesn't have access to these it can't restore
them.

It is currently possible for an external debugger (gdb) to obtain the
NPX status word and report the exception. But this doesn't help an
application that needs to know the cause of an exception it produces,
or higher level languages that have their own debuggers to handle
exceptions (e.g. CMU Common Lisp).

Some operating systems will save the FP state in the signal
context. This has been discussed on the freebsd-current list some time
ago and the general consensus seemed to be that the overhead of doing
so is prohibitive.  This may be the case, and if an application needs
to save the NPX state it can do so in its SIGFPE handler, however the
exceptions in the status word are currently cleared and lost.

>How-To-Repeat:

N/A

>Fix:

The patch below has the NPX status word passed in the signal
code. This allows an application to determine the cause the of
exception and to restore any accrued exceptions on a longjump.
 
*** npx.c.1	Thu Aug 21 23:10:27 1997
--- npx.c	Sun Sep 21 21:07:27 1997
***************
*** 528,542 ****
  		 * just before it is used).
  		 */
  		curproc->p_md.md_regs = (struct trapframe *)&frame->if_es;
- #ifdef notyet
  		/*
! 		 * Encode the appropriate code for detailed information on
! 		 * this exception.
  		 */
! 		code = XXX_ENCODE(curpcb->pcb_savefpu.sv_ex_sw);
! #else
! 		code = 0;	/* XXX */
! #endif
  		trapsignal(curproc, SIGFPE, code);
  	} else {
  		/*
--- 528,538 ----
  		 * just before it is used).
  		 */
  		curproc->p_md.md_regs = (struct trapframe *)&frame->if_es;
  		/*
! 		 * Pass the NPX status word in the signal code so the
! 		 * process can determine the cause of the exception.
  		 */
! 		code = curpcb->pcb_savefpu.sv_ex_sw;
  		trapsignal(curproc, SIGFPE, code);
  	} else {
  		/*

>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: dtc@scrooge.ee.swin.oz.au, FreeBSD-gnats-submit@FreeBSD.ORG
Cc:  Subject: Re: kern/4597: Patch to pass NPX status word in signal code.
Date: Mon, 22 Sep 1997 04:48:36 +1000

 >determining which exception occurred. The suggested change would have
 >the NPX status word passed in the signal code allowing the process to
 >determine the cause of the exception.
 
 One minor problem: the raw npx status word has nothing to do with the
 codes in <machine/trap.h> (and these codes are incomplete, not to
 mention inadequate since they can't be ORed together).  I think the
 code should be fairly raw and <machine/trap.h> should be changed to
 match it.
 
 >Algorithms may also be mixing the monitoring the accrued exceptions
 >with some traps enabled. The clearing of all the exceptions on a
 >SIGFPE clears the accrued exceptions and may break such code unless
 >the accrued exceptions are restored by the application on a longjump;
 >but if the application doesn't have access to these it can't restore
 >them.
 
 My version of npx.c has an option to avoided clearing the exception
 bits.  The idea is to pass the exact state that caused the exception
 to the SIGFPE handler and let it worry about possible nested SIGFPEs.
 Naive handlers will work as follows:
 
 1. If SIGFPE is SIG_IGN'ed, or the handler just returns, the behaviour
 is undefined, as before, and more obviously broken than before, since
 nothing clears the exception bits, so SIGFPEs will occur endlessly on
 the FPU instruction after the one that set the exception bit.  Clearing
 the exception bits allows broken programs to make progress, often with
 a corrupt FPU stack.
 
 2. If the handler calls longjmp(), then longjmp() will clear the exception
 bits.  longjmp() only attempts to preserve the part of the FPU state
 necessary for standard C programs.
 
 Since I also mask all FPU exceptions by default and don't use any
 sophisticated SIGFPE handlers (that I know about), and don't actually
 use the option :-), I don't know how well this works in practice.
 
 diff -c2 npx.c~ npx.c
 *** npx.c~	Fri Aug 22 05:11:14 1997
 --- npx.c	Thu Sep  4 21:29:26 1997
 ***************
 *** 1,2 ****
 --- 1,3 ----
 + static int old_npxintr = 0;
   /*-
    * Copyright (c) 1990 William Jolitz.
 ***************
 *** 508,514 ****
   
   	outb(0xf0, 0);
 ! 	fnstsw(&curpcb->pcb_savefpu.sv_ex_sw);
 ! 	fnclex();
 ! 	fnop();
   
   	/*
 --- 523,538 ----
   
   	outb(0xf0, 0);
 ! 	/*
 ! 	 * Save state to work around the IRQ13 interface bugs.  If the
 ! 	 * exception 16 interface is used then the exception-pending bits
 ! 	 * will be saved and will cause another exception on the next FPU
 ! 	 * instruction in user mode (the same one unless the exception is
 ! 	 * cleared by the application).  If the IRQ13 interface is used
 ! 	 * then the exception-pending bits will be saved and will usually
 ! 	 * a bogus IRQ13 in the kernel when the state is restored.
 ! 	 */
 ! 	npxsave(&curpcb->pcb_savefpu);
 ! 	if (old_npxintr)
 ! 		curpcb->pcb_savefpu.sv_env.en_sw &= ~0x80bf;
   
   	/*
 
 >Some operating systems will save the FP state in the signal
 >context. This has been discussed on the freebsd-current list some time
 >ago and the general consensus seemed to be that the overhead of doing
 >so is prohibitive.  This may be the case, and if an application needs
 >to save the NPX state it can do so in its SIGFPE handler, however the
 >exceptions in the status word are currently cleared and lost.
 
 Saving the state for SIGFPE would be acceptable, but saving it for all
 signal handlers would be wasteful except possibly if the save is done
 lazily (note that lazy saving is not implemented even for ordinary
 context switches and doing it is even further away for the SMP case).
 However, I think presenting the exception state as is to the SIGFPE
 handler is best.  If it knows enough to do fixups, then it can know
 enough to clear the state for itself.
 
 Bruce
State-Changed-From-To: open->closed 
State-Changed-By: johan 
State-Changed-When: Wed Oct 10 12:48:10 PDT 2001 
State-Changed-Why:  
Alot has happened in this area of the file and the 
patch will not apply cleanly any more. 

If this is still a problem in more recent versions of  
FreeBSD please open a new PR. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=4597 
>Unformatted:
