From kolya@orbit.zepa.net Wed Apr 21 13:57:29 1999
Return-Path: <kolya@orbit.zepa.net>
Received: from orbit.zepa.net (orbit.zepa.net [205.245.53.14])
	by hub.freebsd.org (Postfix) with SMTP id 6530515918
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 21 Apr 1999 13:54:41 -0700 (PDT)
	(envelope-from kolya@orbit.zepa.net)
Received: (qmail 84199 invoked by uid 502); 21 Apr 1999 20:52:11 -0000
Message-Id: <19990421205211.84198.qmail@orbit.zepa.net>
Date: 21 Apr 1999 20:52:11 -0000
From: kolya@orbit.zepa.net
Reply-To: kolya@orbit.zepa.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: Page fault, fatal trap in kernel
X-Send-Pr-Version: 3.2

>Number:         11266
>Category:       kern
>Synopsis:       frequent crashes with "Page fault, fatal trap in kernel"
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Apr 21 14:00:01 PDT 1999
>Closed-Date:    Wed May 23 14:25:00 PDT 2001
>Last-Modified:  Wed May 23 14:25:11 PDT 2001
>Originator:     Nickolai Zeldovich
>Release:        FreeBSD 3.1-RELEASE i386
>Organization:
Craig's DATA Exchange
>Environment:

	PentiumII-266, FreeBSD-3.1, running Diablo news server with 4 IDE
disks on two buses, using CCD to stripe. Mostly standard kernel config
except for maxusers and memory parameters.

>Description:

	Every so often (at least once a week) our news box crashes with
error message "Page fault, fatal trap in kernel". When this happens, the
machine remains pingable and I can telnet to it, and it accepts the
connection, yet nothing comes up (e.g. login prompt). Same for trying to
connect to ssh or NNTP. Although this sounds like a problem with NMBCLUSTERS
being set too low, the kernel has this set at 3072 and does not seem to
be a problem (netstat -m never reports peak being above 1000).

>How-To-Repeat:

	Have not determined a precise way to crash this yet. Our news machine
keeps crashing every week or so with this error. Is this a known or suspected
bug in the 3.1 kernel?

>Fix:
	
	


>Release-Note:
>Audit-Trail:

From: "Daniel C. Sobral" <dcs@newsguy.com>
To: kolya@orbit.zepa.net
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Thu, 22 Apr 1999 12:47:37 +0900

 kolya@orbit.zepa.net wrote:
 > 
 > >Environment:
 > 
 >         PentiumII-266, FreeBSD-3.1, running Diablo news server with 4 IDE
 > disks on two buses, using CCD to stripe. Mostly standard kernel config
 > except for maxusers and memory parameters.
 
 Please, when you report a problem, try not to be selective in
 describing environment.
 
 Say, you wouldn't, by any chance, have lots of RAM (256 MB+), and
 large maxusers (128+), would you?
 
 --
 Daniel C. Sobral			(8-DCS)
 dcs@newsguy.com
 dcs@freebsd.org
 
 	"Well, Windows works, using a loose definition of 'works'..."
 
 

From: Nickolai Zeldovich <kolya@zepa.net>
To: "Daniel C. Sobral" <dcs@newsguy.com>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Thu, 22 Apr 1999 01:12:52 -0400 (EDT)

 On Thu, 22 Apr 1999, Daniel C. Sobral wrote:
 
 > kolya@orbit.zepa.net wrote:
 > > 
 > > >Environment:
 > > 
 > >         PentiumII-266, FreeBSD-3.1, running Diablo news server with 4 IDE
 > > disks on two buses, using CCD to stripe. Mostly standard kernel config
 > > except for maxusers and memory parameters.
 > 
 > Please, when you report a problem, try not to be selective in
 > describing environment.
 > 
 > Say, you wouldn't, by any chance, have lots of RAM (256 MB+), and
 > large maxusers (128+), would you?
 
 Close. 256MB memory, maxusers is set at 96 right now. Other things from
 the kernel configuration:
 
 device rl0
 pseudo-device   ccd     4
 options         KTRACE
 options         SYSVSHM
 options         SYSVMSG
 pseudo-device   bpfilter 4
 options         "MAXDSIZ=(384*1024*1024)"
 options         "DFLDSIZ=(384*1024*1024)"
 options         "NMBCLUSTERS=3072"
 
 -- [ Nickolai Zeldovich // nickolai@zepa.net ]
 
 

From: "Daniel C. Sobral" <dcs@newsguy.com>
To: Nickolai Zeldovich <kolya@zepa.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Thu, 22 Apr 1999 23:29:40 +0900

 Nickolai Zeldovich wrote:
 > 
 > Close. 256MB memory, maxusers is set at 96 right now. Other things from
 > the kernel configuration:
 
 > options         "MAXDSIZ=(384*1024*1024)"
 > options         "DFLDSIZ=(384*1024*1024)"
 > options         "NMBCLUSTERS=3072"
 
 3.1-stable, for now, has a problem with large memory configurations
 when certain kernel sizes are set too high.
 
 Maxusers 96 ought to be safe, but your problems looks like the
 mem/max problem, only at a very slow rate. The problem derives from
 the kernel taking up more memory than it has space to map. Since
 this is gradually used, it takes a while for the problem to show up.
 One week is a long while, so you might be a borderline case, because
 of your other options. I suggest lowering maxusers to 90, and see if
 that eliminates the problem, or make your machine survive a while
 longer.
 
 --
 Daniel C. Sobral			(8-DCS)
 dcs@newsguy.com
 dcs@freebsd.org
 
 	"Well, Windows works, using a loose definition of 'works'..."
 

From: Nickolai Zeldovich <kolya@zepa.net>
To: "Daniel C. Sobral" <dcs@newsguy.com>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Thu, 22 Apr 1999 16:56:01 -0400 (EDT)

 On Thu, 22 Apr 1999, Daniel C. Sobral wrote:
 
 > Maxusers 96 ought to be safe, but your problems looks like the
 > mem/max problem, only at a very slow rate. The problem derives from
 > the kernel taking up more memory than it has space to map. Since
 > this is gradually used, it takes a while for the problem to show up.
 > One week is a long while, so you might be a borderline case, because
 > of your other options. I suggest lowering maxusers to 90, and see if
 > that eliminates the problem, or make your machine survive a while
 > longer.
 
 Well, at least I'm not the only one having the problem. I recompiled my
 kernel without the MAXDSIZ and DFLDSIZ settings, lowering NMBCLUSTERS to
 2048 and setting maxusers to 90. The machine promptly crashed again, only
 an hour after being rebooted with a new kernel.
 
 -- [ Nickolai Zeldovich // nickolai@zepa.net ]
 
 

From: "Daniel C. Sobral" <dcs@newsguy.com>
To: Nickolai Zeldovich <kolya@zepa.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Fri, 23 Apr 1999 08:46:48 +0900

 Nickolai Zeldovich wrote:
 > 
 > On Thu, 22 Apr 1999, Daniel C. Sobral wrote:
 > 
 > > Maxusers 96 ought to be safe, but your problems looks like the
 > > mem/max problem, only at a very slow rate. The problem derives from
 > > the kernel taking up more memory than it has space to map. Since
 > > this is gradually used, it takes a while for the problem to show up.
 > > One week is a long while, so you might be a borderline case, because
 > > of your other options. I suggest lowering maxusers to 90, and see if
 > > that eliminates the problem, or make your machine survive a while
 > > longer.
 > 
 > Well, at least I'm not the only one having the problem. I recompiled my
 > kernel without the MAXDSIZ and DFLDSIZ settings, lowering NMBCLUSTERS to
 > 2048 and setting maxusers to 90. The machine promptly crashed again, only
 > an hour after being rebooted with a new kernel.
 
 As you noted, NMBCLUSTERS might be too low. It is actually possible
 to crash the machine with now enough NMBCLUSTERS. Get a higher
 NMBCLUSTERS. Much higher. But keep maxusers at 90, just in case this
 crash is a new problem, and the former was caused by, indeed,
 mem/maxusers.
 
 --
 Daniel C. Sobral			(8-DCS)
 dcs@newsguy.com
 dcs@freebsd.org
 
 	"Well, Windows works, using a loose definition of 'works'..."
 
 
 

From: Nickolai Zeldovich <kolya@zepa.net>
To: "Daniel C. Sobral" <dcs@newsguy.com>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Sat, 22 May 1999 22:34:10 -0400 (EDT)

 Hi,
 
 I suddenly noticed that 3.2-STABLE was released, and was wondering if this
 bug has been possibly fixed? I'm not sure if I want to install it on our
 production news box just to find this out, and none of our non-critical
 machines have the same load and specs.
 
 Thanks in advance,
 
 -- [ Nickolai Zeldovich // nickolai@zepa.net ]
 
 
 

From: "Daniel C. Sobral" <dcs@newsguy.com>
To: Nickolai Zeldovich <kolya@zepa.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/11266: Page fault, fatal trap in kernel
Date: Sun, 23 May 1999 15:56:56 +0900

 Nickolai Zeldovich wrote:
 > 
 > I suddenly noticed that 3.2-STABLE was released, and was wondering if this
 > bug has been possibly fixed? I'm not sure if I want to install it on our
 > production news box just to find this out, and none of our non-critical
 > machines have the same load and specs.
 
 In my last message I suggested a bigger NMBCLUSTER. Did you try
 that?
 
 Also, nothing can be done about panic PRs without a kernel trace, at
 the very least. I can't tell you if the bug has been fixed because I
 don't know what the bug is. Read the section on the handbook on how
 to debug a kernel.
 
 --
 Daniel C. Sobral			(8-DCS)
 dcs@newsguy.com
 dcs@freebsd.org
 
 	"If at first you don't succeed, skydiving is not for you."
 
State-Changed-From-To: open->closed 
State-Changed-By: phk 
State-Changed-When: Wed May 23 14:25:00 PDT 2001 
State-Changed-Why:  
timed out 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=11266 
>Unformatted:
