From dillon@backplane.com  Sun Jun 14 02:55:50 1998
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA14618
          for <FreeBSD-gnats-submit@freebsd.org>; Sun, 14 Jun 1998 02:55:42 -0700 (PDT)
          (envelope-from dillon@backplane.com)
Received: (dillon@localhost) by apollo.backplane.com (8.8.8/8.6.5) id CAA00381; Sun, 14 Jun 1998 02:55:34 -0700 (PDT)
Message-Id: <199806140955.CAA00381@apollo.backplane.com>
Date: Sun, 14 Jun 1998 02:55:34 -0700 (PDT)
From: Matthew Dillon <dillon@backplane.com>
Reply-To: dillon@backplane.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
X-Send-Pr-Version: 3.2

>Number:         6944
>Category:       i386
>Synopsis:       icu_ipl.s does has a case commented as can't happen which happens
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jun 14 03:00:01 PDT 1998
>Closed-Date:    Sat Jul 10 07:54:18 PDT 1999
>Last-Modified:  Sat Jul 10 07:57:36 PDT 1999
>Originator:     Matthew Dillon
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
BEST Internet Communications
>Environment:

	FreeBSD-current, June 14 cvs update, PPro 200 w/ PCI 10/100BaseT
	and SCSI & IDE disks

>Description:

	If FreeBSD is running cpu-bound processes and a network interrupt
	occurs, causing an swi_net to occur, if the network swi queues an
	AST (for example, if it wakes up nfsd), the AST will get lost and
	the nfsd will not get the cpu until the next clock interrupt.

	This can create severe NFS slowdowns as well as slowdowns to processes
	being woken up from the network (or even other interrupts).

	The problem is in icu_ipl.s.  The situation:

	    cmpl	$SWI_AST,%ecx
     	    je		splz_nextx		/* "can't happen" */

	Actually can happen.  I'm not exactly sure how it happens, but the
	result is that that AST gets cleared from ipending without being run.

	The patch I include re-sets the bit in ipending and also sets the
	bit in the temporary CPL to prevent it from trying to dispatch it
	again in the routine. 

	This fixes the problem completely on my machine... now I can run
	cpu bound programs and bring up xterms and nfsd is no longer effects
	by the cpu-bound processes.  I am marking the bug critical & high
	priority because it's in the core interrupt code, but the bug does
	not cause a fatal condition to occur (it just slows down process
	wakeups massively in the face of a cpu-bound program).

>How-To-Repeat:

	I noticed this trying to run an xterm on my diskless workstation 
	while running crack in the background on my server.  The xterm took 
	30 seconds to come up.  Killing the cpu-bound crack and the xterm 
	took 2 seconds to come up.


>Fix:

	Notice!  This patch fixes icu_ipl.s.  apic_ipl.s might have the
	same problem.
	
Index: icu_ipl.s
===================================================================
RCS file: /src/FreeBSD-CVS/ncvs/src/sys/i386/isa/icu_ipl.s,v
retrieving revision 1.3
diff -c -r1.3 icu_ipl.s
*** icu_ipl.s	1997/09/02 19:40:13	1.3
--- icu_ipl.s	1998/05/31 09:36:16
***************
*** 107,119 ****
  	ALIGN_TEXT
  splz_swi:
  	cmpl	$SWI_AST,%ecx
! 	je	splz_next		/* "can't happen" */
  	pushl	%eax
  	orl	imasks(,%ecx,4),%eax
  	movl	%eax,_cpl
  	call	%edx
  	popl	%eax
  	movl	%eax,_cpl
  	jmp	splz_next
  
  /*
--- 107,124 ----
  	ALIGN_TEXT
  splz_swi:
  	cmpl	$SWI_AST,%ecx
! 	je	splz_nextx		/* "can't happen" XXX can happen! */
  	pushl	%eax
  	orl	imasks(,%ecx,4),%eax
  	movl	%eax,_cpl
  	call	%edx
  	popl	%eax
  	movl	%eax,_cpl
+ 	jmp	splz_next
+ 
+ splz_nextx:
+ 	orl	$0x80000000,%eax
+ 	orl	$0x80000000,_ipending
  	jmp	splz_next
  
  /*
>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: dillon@backplane.com, FreeBSD-gnats-submit@FreeBSD.ORG
Cc:  Subject: Re: i386/6944: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
Date: Sun, 14 Jun 1998 21:38:02 +1000

 >	The problem is in icu_ipl.s.  The situation:
 >
 >	    cmpl	$SWI_AST,%ecx
 >     	    je		splz_nextx		/* "can't happen" */
 >
 >	Actually can happen.  I'm not exactly sure how it happens, but the
 >	result is that that AST gets cleared from ipending without being run.
 
 It "can't happen" because SWI_AST_MASK is "always" set in `cpl' until
 the kernel is about to return to user mode.  Something must be clearing
 SWI_AST_MASK in `cpl' or in the cpl to be "restored".  The typo spl(0)
 instead of spl0() would do this.  Please look for whatever does it.
 This may be as simple as looking at the stack trace to see spl(0) and
 verifying that SWI_AST_MASK is set (you can't trust the latter since
 ddb doesn't mask interrupts).
 
 Bruce

From: Matthew Dillon <dillon@backplane.com>
To: Bruce Evans <bde@zeta.org.au>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/6944: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
Date: Sun, 14 Jun 1998 10:54:42 -0700 (PDT)

 :>	    cmpl	$SWI_AST,%ecx
 :>     	    je		splz_nextx		/* "can't happen" */
 :>
 :>	Actually can happen.  I'm not exactly sure how it happens, but the
 :>	result is that that AST gets cleared from ipending without being run.
 :
 :It "can't happen" because SWI_AST_MASK is "always" set in `cpl' until
 :the kernel is about to return to user mode.  Something must be clearing
 :SWI_AST_MASK in `cpl' or in the cpl to be "restored".  The typo spl(0)
 :instead of spl0() would do this.  Please look for whatever does it.
 :This may be as simple as looking at the stack trace to see spl(0) and
 :verifying that SWI_AST_MASK is set (you can't trust the latter since
 :ddb doesn't mask interrupts).
 :
 :Bruce
 
     Well, I spent 6 hours from 9p.m. to 3a.m. just find this :-)  I'm going
     to leave the finding of the broken spl to someone else, but there ARE
     several places where $0 is loaded into the cpl in the assembly, and 
     other places where the interrupt nesting count is manually reset to 1.
     I'm not sure it's necessary to 'reset' the cpl states, the standard
     interrupt context push/pop ought to do that inherently so if things are
     being left dangling there's definitely something wrong elsewhere in the
     code that these manual resets are 'covering up'.  It could be anywhere.
     The spl0()/splz() stuff is a mess and should probably be removed entirely.
 
     The problem is extremely reproducable... just NFS mount / and /usr from
     a server to a workstation, run a for (;;); process on the server, and
     try to run xterm on the workstation and, poof. 
 
     When I did this, vmstat showed the number of context switches never 
     exceeded 100.  Hmm... suspicious!  Without ./x (the for (;;); process)
     running, the number of context switches went to 600+/sec for two seconds
     to load xterm via NFS.  With ./x running the number of context switches
     was around 50/sec and running xterm on the client increased it to only 
     100/sec, and xterm took forever to load via nfs.
 
     With the fix and ./x running, xterm took only 2 seconds to load via
     NFS and was completely uneffected by the existance of the cpu-bound
     task.
 
     -
 
     I'd suggest changing the assembly to do a sanity check of the cpl rather
     then simply save/restore it around an SWI (or normal interrupt for that
     matter)... if the cpl isn't in the state it left it before the call
     to the handler, printf() a warning.
 
     I also noticed that the fast interrupt code doesn't save/restore the cpl
     around the call to the interrupt handler, but the 'normal' interrupt 
     code does.  I believe the code thinks this is ok because it's leaving
     the cpu CLI'd through the call, but I actually think the slow interrupt
     handler results in faster operation because the interrupt context doesn't
     get popped & repushed through a ring change if a nested interrupt occurs.
     I also submit that the fast interrupt code doesn't make the system any 
     more responsive...  the two critical time-sensitive interrupts are the 
     ethernet rx and the serial rx and neither is able to keep up as it
     stands... our 100BaseTX boards almost universally get knocked back into
     store and forward mode due to rx overruns after the machine's been up 
     for a while, and anyone with a digital camera can tell you that the 
     serial interrupt sucks rocks in terms of being able to process exceptions 
     at a high rate in unhandshaked mode without overrunning.  Running a 
     'fast' interrupt with interrupts disabled isn't a hot idea when the 'fast'
     interrupt isn't the serial or network receive interrupt!
 
     But whatever the case, the core assembly shouldn't be gratuitiously
     clearing the AST from ipending if it doesn't intend to run the AST
     trap :-)
 
 						-Matt
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]
 

From: Bruce Evans <bde@zeta.org.au>
To: bde@zeta.org.au, dillon@backplane.com
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/6944: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
Date: Mon, 15 Jun 1998 05:01:45 +1000

 >    Well, I spent 6 hours from 9p.m. to 3a.m. just find this :-)  I'm going
 >    to leave the finding of the broken spl to someone else, but there ARE
 >    several places where $0 is loaded into the cpl in the assembly, and 
 
 Mainly SMP places in swtch.s and VM86 places in ipl.s.  I can't see any
 relevant problems there (except it's hard to see anything because of the
 ifdefs).
 
 >    other places where the interrupt nesting count is manually reset to 1.
 
 That should just be an optimization.  incl/decl should work, but takes one
 more memory access.  Unfortunately, this optimization tends to hide bugs.
 
 >    The problem is extremely reproducable... just NFS mount / and /usr from
 >    a server to a workstation, run a for (;;); process on the server, and
 >    try to run xterm on the workstation and, poof. 
 
 Not reproducible here.  I'll try using VM86.
 
 >    I also noticed that the fast interrupt code doesn't save/restore the cpl
 >    around the call to the interrupt handler, but the 'normal' interrupt 
 >    code does.  I believe the code thinks this is ok because it's leaving
 
 cpl has no effect on fast interrupt handlers.
 
 >    the cpu CLI'd through the call, but I actually think the slow interrupt
 >    handler results in faster operation because the interrupt context doesn't
 >    get popped & repushed through a ring change if a nested interrupt occurs.
 
 No, slow interrupt handlers do a few more integer operations, and
 at least 2 more i/o operations which may each take as long as several
 hundred integer instructions.  The slow pop-repush case only occurs when
 a fast interrupt handler wants to switch to a SWI handler.  Nested
 interrupts can't occur while the fast interrupt handler is running.
 The nest level is only checked to avoid going too deep.
 
 >    I also submit that the fast interrupt code doesn't make the system any 
 >    more responsive...  the two critical time-sensitive interrupts are the 
 
 They are no supposed to.  They are supposed to make dumb hardware work
 at all.  E.g., without fast interrupts, a 16450 at 115200 bps would lose
 about 30 fifos full of input whenever someone presses CapsLock, because
 the console driver disables tty interrupts while setting the LEDs, and
 setting the LEDS takes about 3 msec.  A 16550 would lost only about 2
 fifos full :-).
 
 >    ethernet rx and the serial rx and neither is able to keep up as it
 >    stands... our 100BaseTX boards almost universally get knocked back into
 >    store and forward mode due to rx overruns after the machine's been up 
 
 Fast interrupt handlers are not used by any network drivers.  Network
 interrupts for CapsLock if SLIP is configured or PPP is active :-(.
 
 >    for a while, and anyone with a digital camera can tell you that the 
 >    serial interrupt sucks rocks in terms of being able to process exceptions 
 >    at a high rate in unhandshaked mode without overrunning.  Running a 
 >    'fast' interrupt with interrupts disabled isn't a hot idea when the 'fast'
 >    interrupt isn't the serial or network receive interrupt!
 
 That's one reason why fast interrupts are only used by serial drivers.
 When getting to the interrupt handler takes 5 usec (longer on old systems)
 it doesn't hurt at all to keep interrupts disabled another 5-10 usec to do
 all the i/o that a 16450 can do, and it doesn't hurt much to keep them
 disabled another 50 usec to do all the i/o that a 16550 can do.
 
 Bruce

From: Matthew Dillon <dillon@backplane.com>
To: Bruce Evans <bde@zeta.org.au>
Cc: bde@zeta.org.au, FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/6944: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
Date: Sun, 14 Jun 1998 14:41:35 -0700 (PDT)

 :>    Well, I spent 6 hours from 9p.m. to 3a.m. just find this :-)  I'm going
 :>    to leave the finding of the broken spl to someone else, but there ARE
 :>    several places where $0 is loaded into the cpl in the assembly, and 
 :
 :Mainly SMP places in swtch.s and VM86 places in ipl.s.  I can't see any
 :relevant problems there (except it's hard to see anything because of the
 :ifdefs).
 
     Yah, I checked that out... but the problem was still there after turning
     off VM86, and I've got SMP turned off.  I also have APIC_IO turned off.
 
 :>    The problem is extremely reproducable... just NFS mount / and /usr from
 :>    a server to a workstation, run a for (;;); process on the server, and
 :>    try to run xterm on the workstation and, poof. 
 :
 :Not reproducible here.  I'll try using VM86.
 
     Damn.  I was hoping it would be easily reproduceable.  I see the same
     effect on our 2.2.x production machines.
 
     The diskless workstation is a -current kernel talking to a -current 
     server.  nfsd -n 4 on the server, nfsiod -n 2 on the client.  In one
     of my iterations I was dumping debugging printf()'s in the kernel,
     which showed that while the forever process ./x was running, the nfs 
     socket wakeup WAS setting the AST in ipending along with want_resched,
     but that nfsd wasn't getting woken up until the next timer tick.  If
     the machine was idle (no forever process), nfsd would get the cpu 
     immediately.
 
     Also, another point of reference:  If the forever process is running but
     with system call in a tight loop, like this:
 
 	for (;;)
 	    write(1, buf, 0);
 
     Then the nfsd would get woken up instantly for each request.  But if the
     forever process was running like this:
 
 	for (;;)
 	    ;
 
     The nfsd would not get woken up until the next timer tick.
 
 						-Matt
 
 :
 :Bruce
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]

From: Matthew Dillon <dillon@backplane.com>
To: Bruce Evans <bde@zeta.org.au>
Cc: bde@zeta.org.au, dillon@backplane.com, FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/6944: bug in i386/isa/icu_ipl.s - AST gets lost, causes extreme network slowdown when cpu-bound processes present, possibly other problems
Date: Sun, 14 Jun 1998 19:00:43 -0700 (PDT)

     Fubar.  I can't reproduce the xterm example on BEST's machines (running
     2.2.6).  It's definitely reproduceable on my machine (3.0-current).
 
     I would like to track down what is causing the problem, so if anyone has
     any suggestions on where the cpl can be checked for illegal values,
     I'd appreciate it.  I'll retest the VM86 removal to make sure that didn't
     fix the problem and I'll test the AUTOEOI configs as well.
 
     As far as I can tell, the only two places where the bug can possibly be
     are in the interface driver (pci/if_de.c) or the UDP/IP stack.
 
 						-Matt
 
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: unfurl 
Responsible-Changed-When: Thu Jul 1 18:44:11 PDT 1999 
Responsible-Changed-Why:  
Matt, can you either comment on this or close the PR if it is a non-issue? T 
State-Changed-From-To: open->closed 
State-Changed-By: bde 
State-Changed-When: Sat Jul 10 07:54:18 PDT 1999 
State-Changed-Why:  
Fixed in -current in rev.1.23 of intr_machdep.c. 
The bug only occurs on systems with shared iterrupts. 
>Unformatted:
