From peter@newton.dialix.com.au  Tue Oct  8 22:03:35 1996
Received: from newton.dialix.com.au (newton.dialix.com.au [192.203.228.8])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id WAA05985
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 8 Oct 1996 22:03:29 -0700 (PDT)
Received: (from peter@localhost)
          by newton.dialix.com.au (8.7.6/8.7.3) id NAA02004;
          Wed, 9 Oct 1996 13:03:19 +0800 (WST)
Message-Id: <199610090503.NAA02004@newton.dialix.com.au>
Date: Wed, 9 Oct 1996 13:03:19 +0800 (WST)
From: Peter Wemm <peter@haywire.dialix.com>
Reply-To: peter@newton.dialix.com.au
To: FreeBSD-gnats-submit@freebsd.org
Subject: run queue or proc list smashed 4 times in 2 days
X-Send-Pr-Version: 3.2

>Number:         1744
>Category:       kern
>Synopsis:       run queue or proc list smashed 4 times in 2 days
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    peter
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct  8 22:10:01 PDT 1996
>Closed-Date:    Mon Sep 25 12:58:21 PDT 2000
>Last-Modified:  Mon Sep 25 12:58:50 PDT 2000
>Originator:     Peter Wemm
>Release:        FreeBSD 2.2-961004-SNAP i386
>Organization:
What, here? :-)
>Environment:

Vanilla i486 box, 16M, 2 IDE drives and one slow SCSI drive on an AHA1542CF.
FreeBSD newton.dialix.com.au 2.2-961004-SNAP FreeBSD 2.2-961004-SNAP #30: Tue Oct  8 06:34:52 WST 1996     peter@newton.dialix.com.au:/home2/src/sys/compile/NEWTON  i386

>Description:

Normally, this is a quiet machine, but it's taken a nose-dive in stability
in the last two days.

It's been faulting like this:

WARNING: / was not properly dismounted.

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xf01aa108
stack pointer           = 0x10:0xefbffe0c
frame pointer           = 0x10:0xefbffe30
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty bio
panic: page fault

Syncing disks...

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x10
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xf012925a
stack pointer           = 0x10:0xefbffc88
frame pointer           = 0x10:0xefbffc98
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty bio
panic: page fault

dumping to dev 20001, offset 32768
dump 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

In this particular case, it died in cpu_switch about line 364:
        /* XX update whichqs? */
        btrl    %ebx,%edi                       /* clear q full status */
        leal    _qs(,%ebx,8),%eax               /* select q */
        movl    %eax,%esi

        movl    P_FORW(%eax),%ecx               /* unlink from front of process
q */
        movl    P_FORW(%ecx),%edx
        movl    %edx,P_FORW(%eax)
        movl    P_BACK(%ecx),%eax
        movl    %eax,P_BACK(%edx)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        cmpl    P_FORW(%ecx),%esi               /* q empty */
        je      3f

The backtrace looks like this:
[.. rest of trap processing ..]
#13 0xf01a2ce1 in calltrap ()
#14 0xf010e6bd in tsleep ()
#15 0xf0120327 in sbwait ()
#16 0xf011f0e3 in soreceive ()
#17 0xf0121b90 in recvit ()
#18 0xf0121dff in recvfrom ()
#19 0xf01ab0d3 in syscall ()
#20 0xf01a2d35 in Xsyscall ()

The process that was running was either of:
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
    1   176     4   0   2  0   208    0 sbwait SWs   ??    0:00.00  (rwhod)
    0 27386     4   1   2  0   148    0 sbwait Ss    ??    0:00.00  (comsat)

This particular kernel is not running any modified code.

The other three dumps were quite similar, but I din't have the disk space
at the time to save them for analysis.

>How-To-Repeat:

I don't think this box is doing anything unusual, apart from cvsup which
makes it sweat a fair bit.  (a 6.5MB process on a 16M machine that's doing 
other things is hard work :-)

>Fix:
	
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->analyzed 
State-Changed-By: peter 
State-Changed-When: Mon Oct 28 09:50:52 PST 1996 
State-Changed-Why:  
Something appears to depend on how the kernel is linked.  After  a kernel 
rebuild (changing "cpu i486" and "cpu i586" to only just "cpu i586") made 
the problem go away.  Another machine had it's kernel rebuilt at the same 
time and started getting the same problem.  Now that the other machine has 
been rebuilt with a trivial change (pseudo-device vn 4 -> 2) and config -g, 
it too seems to have stopped exploding.  I'm at a loss to explain it. 
State-Changed-From-To: analyzed->closed 
State-Changed-By: phk 
State-Changed-When: Thu Sep 18 07:47:20 PDT 1997 
State-Changed-Why:  

I'm sure you've liked this one, right ? 
State-Changed-From-To: closed->open 
State-Changed-By: peter 
State-Changed-When: Fri Sep 19 18:57:58 PDT 1997 
State-Changed-Why:  
Actually, no, it's not fixed, it happened two days ago on another machine 
on a three day old 2.2-stable kernel.  That particular machine regularly 
has VM deadlocks.  (It's only got 16MB of ram and running a couple of 
incoming sendmails will kill it.  The MaxDaemonChildren sendmail setting 
has been lowered to 2 :-( ) 
Responsible-Changed-From-To: freebsd-bugs->peter 
Responsible-Changed-By: phk 
Responsible-Changed-When: Sat Apr 11 03:58:21 PDT 1998 
Responsible-Changed-Why:  
Peter, if this is no longer a problem, then just close. 

Could this be a broken 486 cache? -Matt
State-Changed-From-To: open->closed 
State-Changed-By: peter 
State-Changed-When: Mon Sep 25 12:58:21 PDT 2000 
State-Changed-Why:  
These machines are long gone. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=1744 
>Unformatted:
