From spatula@spatula.gulf.net  Tue Jan 14 06:15:11 1997
Received: from pompano.pcola.gulf.net (root@pompano.pcola.gulf.net [198.69.72.14])
          by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id GAA14292
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 14 Jan 1997 06:15:10 -0800 (PST)
Received: from spatula.gulf.net (root@bonito21.pcola.gulf.net [198.69.79.51]) by pompano.pcola.gulf.net (8.8.4/8.7.3) with ESMTP id IAA14493 for <FreeBSD-gnats-submit@freebsd.org>; Tue, 14 Jan 1997 08:15:02 -0600 (CST)
Received: (from spatula@localhost) by spatula.gulf.net (8.7.5/8.7.3) id IAA00430; Tue, 14 Jan 1997 08:14:58 -0600 (CST)
Message-Id: <199701141414.IAA00430@spatula.gulf.net>
Date: Tue, 14 Jan 1997 08:14:58 -0600 (CST)
From: spatula@gulf.net
Reply-To: spatula@gulf.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: page faults
X-Send-Pr-Version: 3.2

>Number:         2494
>Category:       kern
>Synopsis:       constant page faults in kernel mode
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jan 14 06:20:06 PST 1997
>Closed-Date:    Sun May 31 12:36:02 PDT 1998
>Last-Modified:  Sun May 31 12:38:40 PDT 1998
>Originator:     Nick Johnson
>Release:        FreeBSD 2.1-STABLE i386 (2.1.5-RELEASE)
>Organization:
>Environment:
	Pentium-100 machine, 32 megs of ram.  Typical FBSD 2.1.5 install.
	

>Description:

	System pukes upon itself regularly.  The debugger in the kernel 
indicates that the cause of death is a page fault while in kernel mode.  
This is almost always the error.  Further information indicates that the 
error is a result of a page-not-present.  Severity and frequency of the 
problem increase when external cache is turned on, and decreases slighly 
when it is disabled.  Severity and frequency also appears to increase if 
X-windows is run.

	Hardware has been thoroughly tested for memory controller faults 
and bad cache/simms with everything testing out fine.

	

>How-To-Repeat:

	Boot.

	

>Fix:

	Unknown at this time; some help can be found by disabling the 
external cache memory, but the problem still occurs.	

	

>Release-Note:
>Audit-Trail:

From: j@uriah.heep.sax.de (J Wunsch)
To: spatula@gulf.net
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/2494: page faults
Date: Wed, 15 Jan 1997 09:30:05 +0100

 As spatula@gulf.net wrote:
 
 >   Severity and frequency of the 
 > problem increase when external cache is turned on, and decreases slighly 
 > when it is disabled.  Severity and frequency also appears to increase if 
 > X-windows is run.
 > 
 > 	Hardware has been thoroughly tested for memory controller faults 
 > and bad cache/simms with everything testing out fine.
 
 No, it didn't test out fine, apparently.  A FreeBSD `make world' is
 commonly agreed to be a much better hardware test than anything you
 else.
 
 Unless your page faults repeatedly appear at similar addresses, all
 this smells like bad RAM.  You need at least to provide us with kernel
 stack traces if the fault is repeatable at a single spot.
 
 I will eventually change the status of this PR to `feedback', since
 it's plain useless for us in the current state.  The information
 presented is simply too weak to track anything by it.
 
 -- 
 cheers, J"org
 
 joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
 Never trust an operating system you don't have sources for. ;-)

From: Prisoner <spatula@gulf.net>
To: Joerg Wunsch <joerg_wunsch@uriah.heep.sax.de>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/2494: page faults
Date: Wed, 15 Jan 1997 07:18:52 -0600 (CST)

 On Wed, 15 Jan 1997, J Wunsch wrote:
 
 > No, it didn't test out fine, apparently.  A FreeBSD `make world' is
 > commonly agreed to be a much better hardware test than anything you
 > else.
 
    Perhaps I should rephrase: everything I can do to test it has failed to
 show a problem, including 9-10 hours each of several diagnostic programs
 running in a much lamer operating system. 
 
 > Unless your page faults repeatedly appear at similar addresses, all
 > this smells like bad RAM.  You need at least to provide us with kernel
 > stack traces if the fault is repeatable at a single spot.
 
    The page fault is almost always exactly the same.  Here's the debugger 
 information from the last (and most common) fault:
 
 fault virtual address	= 0x7200c4c
 fault code		= supervisor read, page not present
 instruction pointer	= 0x8:0xf017c4b4
 code segment		= base 0x0, limit 0xfffff, type 0x1b
 			= DPL 0, pres 1, def32 1, gran 1
 processor eflags	= trace/trap, interrupt enabled, resume, IOPL=0
 current process		= 4 (update)
 interrupt mask	=
 kernel: type 12 trap, code=0
 breakpoint at _ffs_update +0xa4:  cmpl  $0x1,0x52c(%ebx)
 
    It's always within a few instructions to this location.  I am now 
 experimenting with eliminating various programs from running to see if 
 anything is hosing things up.  I think I have a conclusive result, but I 
 don't want to say anything until I can prove it.
 
    Nick
 
 --
 "Your views are not important"
   - Nyder, from Doctor Who: Genesis of the Daleks
 Nick Johnson, not to be trifled with. http://www.gulf.net/~spatula/
 

From: j@uriah.heep.sax.de (J Wunsch)
To: spatula@gulf.net (Prisoner)
Cc: freebsd-gnats-submit@freefall.freebsd.org
Subject: Re: kern/2494: page faults
Date: Thu, 16 Jan 1997 20:23:43 +0100

 As Prisoner wrote:
 
 >  fault virtual address	= 0x7200c4c
 >  fault code		= supervisor read, page not present
 >  instruction pointer	= 0x8:0xf017c4b4
 >  code segment		= base 0x0, limit 0xfffff, type 0x1b
 >  			= DPL 0, pres 1, def32 1, gran 1
 >  processor eflags	= trace/trap, interrupt enabled, resume, IOPL=0
 >  current process		= 4 (update)
 >  interrupt mask	=
 >  kernel: type 12 trap, code=0
 >  breakpoint at _ffs_update +0xa4:  cmpl  $0x1,0x52c(%ebx)
 >  
 >     It's always within a few instructions to this location.
 
 This is in /sys/ufs/ffs/ffs_inode.c:
 
 int
 ffs_update(ap)
 	struct vop_update_args /* {
 		struct vnode *a_vp;
 		struct timeval *a_access;
 		struct timeval *a_modify;
 		int a_waitfor;
 	} */ *ap;
 {
 ...
 	fs = ip->i_fs;
 	/*
 	 * Ensure that uid and gid are correct. This is a temporary
 	 * fix until fsck has been changed to do the update.
 	 */
 	if (fs->fs_inodefmt < FS_44INODEFMT) {		/* XXX */
 	    ^^^^
 	    here
 
 If it were a genuine bug in the code, it should always happen at the
 same spot, not just `somewhere around'.
 
 The fault VA looks suspicous, it's 0x7200c4c - 0x52c = 0x72000720.
 Somehow, the ip->i_fs pointer has been trashed by dumping the short
 value 0x720 over it.  Incidentally, this value is just a space in the
 video screen buffer, together with the default attribute 0x7 (light
 gray on black).  It looks like part of your screen updates go wild
 into the memory.
 
 -- 
 cheers, J"org
 
 joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
 Never trust an operating system you don't have sources for. ;-)
State-Changed-From-To: open->closed 
State-Changed-By: steve 
State-Changed-When: Sun May 31 12:36:02 PDT 1998 
State-Changed-Why:  
Hmm... Mail to the originator bounced with what had to 
be the most out-of-line message that I will close this 
PR with the following remarks:  No I am not a spammer 
and I ain't going to hell.  I've been there and didn't 
care for it much! :p 
>Unformatted:
<Synopsis of the problem (one line)>
<[ non-critical | serious | critical ] (one line)>
<[ low | medium | high ] (one line)>
<Problem category (as listed above)>
<[ sw-bug | doc-bug | change-request | support ] (one line)>
