From nobody  Mon May 11 03:54:42 1998
Received: (from nobody@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id DAA19200;
          Mon, 11 May 1998 03:54:42 -0700 (PDT)
          (envelope-from nobody)
Message-Id: <199805111054.DAA19200@hub.freebsd.org>
Date: Mon, 11 May 1998 03:54:42 -0700 (PDT)
From: will@iki.fi
To: freebsd-gnats-submit@freebsd.org
Subject: SMP idle cpl breaks signal forwarding
X-Send-Pr-Version: www-1.0

>Number:         6587
>Category:       kern
>Synopsis:       SMP idle cpl breaks signal forwarding
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon May 11 04:00:02 PDT 1998
>Closed-Date:    Thu Jul 15 00:47:47 PDT 1999
>Last-Modified:  Thu Jul 15 00:49:49 PDT 1999
>Originator:     Ville-Pertti Keinonen
>Release:        -current
>Organization:
>Environment:
Should occur on any SMP machine.
Current versions to at least May 4th.

>Description:
The cpl either should always be 0 when entering the kernel from user
mode or no code should rely on it being 0.

Currently, some code relies on it, some doesn't, and at least in the
case where the other cpu is idle, it may not correspond to what is set
when going to user mode.

This sometimes breaks signal forwarding if the signal occurs in an
interrupt handler.

What seems to be happening is this:

 - A clock interrupt occurs on an idle cpu.
 - The signal is set and forwarded.
 - Xcpuast is entered by the cpu running the process (waits on lock).
 - The idle cpu returns from the interrupt to an idle state, leaving
the cpl as SWI_AST_MASK.
 - Xcpuast continues, sets the ast in ipending and branches to _doreti
with the cpl left by the other cpu -- the ast isn't processed.
 - The cpl eventually gets cleared, the ast in ipending is probably
processed by the cpu that was idle.

I'm not sure where exactly the cpl becomes SWI_AST_MASK (I only looked
at what was going on with the cpu that was running the process that
was supposed to get the signal, the rest is speculation), but my guess
is it's restored when returning from the interrupt to the idle state
(it seems that in the idle state the cpl is set by a call to spl0).

>How-To-Repeat:
On a mostly idle SMP machine, run the following program:

#include <signal.h>
#include <stdio.h>
#include <unistd.h>

void
handler(int signo)
{
    printf("got signal\n");
    exit(0);
}

int
main(int argc, char **argv)
{
    signal(SIGALRM, handler);
    alarm(1);
    for (;;)
        ;
}

Repeat several times, run it using time(1) to verify that it's
definitely not doing what it's supposed to (it should be obvious
enough, in any case -- for me it usually takes ten seconds or more
for the signal to arrive if I do nothing).

>Fix:
One workaround (this could be done many ways) is to always return the
cpl to 0 before returning from an exception to user mode in _doreti.

This is not the correct way to fix the problem.

Adding the following lines to _doreti (in sys/i386/isa/ipl.s) seems
to work.  (From memory, not a real patch, sorry)

_doreti:
#ifdef SMP
        TEST_CIL
#endif
        FAKE_MCOUNT(_bintr)             /* init "from" _bintr -> _doreti */
        addl    $4,%esp                 /* discard unit number */
        popl    %eax                    /* cpl or cml to restore */
+	testb	$3,52(%esp)		/* going back to user mode?  */
+	jz	1f
+	xorl	%eax,%eax		/* yup, cpl should be 0.  */
+1:
doreti_next:

Note that this makes the equivalent tests done for some traps
redundant.

>Release-Note:
>Audit-Trail:

From: Tor Egge <Tor.Egge@idi.ntnu.no>
To: will@iki.fi
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/6587: SMP idle cpl breaks signal forwarding
Date: Mon, 11 May 1998 23:27:46 +0200

 Having a shared AST flag in an SMP configuration is wrong. 
 This is one of the bad side effects.
 
 Appended is an attempt at a workaround. Consider it my submission to 
 a kludge stacking contest.
 
 - Tor Egge
 
 Index: swtch.s
 ===================================================================
 RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v
 retrieving revision 1.71
 diff -u -r1.71 swtch.s
 --- swtch.s	1998/04/06 15:44:31	1.71
 +++ swtch.s	1998/05/11 20:46:13
 @@ -49,6 +49,8 @@
  #include <machine/pmap.h>
  #include <machine/apic.h>
  #include <machine/smptests.h>		/** GRAB_LOPRIO */
 +#include <machine/ipl.h>
 +#include <machine/lock.h>
  #endif /* SMP */
  
  #include "assym.s"
 @@ -308,6 +310,10 @@
  	 *
  	 * XXX: we had damn well better be sure we had it before doing this!
  	 */
 +	CPL_LOCK
 +	andl	$~SWI_AST_MASK, _ipending 
 +	movl	$0, _cpl	/* Allow ASTs on other CPU */
 +	CPL_UNLOCK
  	movl	$FREE_LOCK, %eax
  	movl	%eax, _mp_lock
  
 @@ -357,16 +363,20 @@
  	jmp	idle_loop
  
  3:
 -#ifdef SMP
  	movl	$LOPRIO_LEVEL, lapic_tpr	/* arbitrate for INTs */
 -#endif
  	call	_get_mplock
 +	CPL_LOCK
 +	movl	$SWI_AST_MASK, _cpl	/* Disallow ASTs on other CPU */
 +	CPL_UNLOCK	
  	cmpl	$0,_whichrtqs			/* real-time queue */
  	CROSSJUMP(jne, sw1a, je)
  	cmpl	$0,_whichqs			/* normal queue */
  	CROSSJUMP(jne, nortqr, je)
  	cmpl	$0,_whichidqs			/* 'idle' queue */
  	CROSSJUMP(jne, idqr, je)
 +	CPL_LOCK
 +	movl	$0, _cpl		/* Allow ASTs on other CPU */
 +	CPL_UNLOCK
  	call	_rel_mplock
  	jmp	idle_loop
  
State-Changed-From-To: open->closed 
State-Changed-By: hoek 
State-Changed-When: Thu Jul 15 00:47:47 PDT 1999 
State-Changed-Why:  
This was fixed shortly after being submitted using tegge's fix by someone 
who neglected to close the PR when they committed rev. 1.72 of swtch.s on 
May 12, 1998.  More recently, the specific case has also been fixed by bde, 
but the latter fix is redundant with regards to closing this pr. 

Thanks for the report! 
>Unformatted:
