From nobody@FreeBSD.org  Mon Jul  9 13:47:07 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 2462D37B403
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  9 Jul 2001 13:47:07 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.3/8.11.3) id f69Kl7u30744;
	Mon, 9 Jul 2001 13:47:07 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200107092047.f69Kl7u30744@freefall.freebsd.org>
Date: Mon, 9 Jul 2001 13:47:07 -0700 (PDT)
From: Rudi Mathijssen <r.mathijssen@iris-ict.nl>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Router/nameserver system crashes 2-3 times monthly
X-Send-Pr-Version: www-1.0

>Number:         28844
>Category:       kern
>Synopsis:       Router/nameserver system crashes 2-3 times monthly
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 09 13:50:01 PDT 2001
>Closed-Date:    Sun Jan 13 10:10:28 PST 2002
>Last-Modified:  Sun Jan 13 10:10:58 PST 2002
>Originator:     Rudi Mathijssen
>Release:        FreeBSD 4.2-RELEASE
>Organization:
Stichting IRIS
>Environment:
FreeBSD irsdevgate.irsdev.rechtsbijstand.net 4.2-RELEASE FreeBSD 4.2-RELEASE #0 Wed Apr 25 14:27:52 CEST 2001  root@irsdevgate.irsdev.rechtsbijstand.net:/usr/src/sys/compile/DEVGATE  i386
>Description:
System Compaq Proliant ML350 (600MHz pentium 3, 128MB ram) has 6 100Mbit nics: fxp0 (Intel Pro 10/100B/100+ Etherne), de0..3 (SMC 9332BDT), xl0 (3Com 3c905B-TX Fast Etherlink XL). Each nic is connected to a 3com 100Mbit hub (half duplex!) which is wired to a second proliant (traffic analyser and also cold standy for the gateway), and each to a 3com switch model 3300 to which the nt, sco, and true64 servers and pc hubs are connected. Three of the six lans are connected to a cisco wan router.
Disk configuration: RAID-1 setup with 2 9.1GB wu2 scsi disks on a compaq smart array 3200.
The custom kernel is ipfw enabled:
options         IPFIREWALL
options         IPFIREWALL_DEFAULT_TO_ACCEPT
options         DUMMYNET
but firewall_enable is currently set to "NO".
The system used to run FreeBSD 4.0, without the raid controller, and there was not a single problem. Since the "upgrade" it crashes irregularly 2-3 times per month. There is no crash dump or anything in /var/log. I have no screen dump of the panic screen (should we have a camera in the computer room?). After a few minutes the system resets itself. At times there are notifications in /var/log/messages like:
Jul  3 01:43:31 irsdevgate /kernel: de1: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
Jul  5 09:29:42 irsdevgate /kernel: xl0: no memory for rx list -- packet dropped!
There is, however, no correlation between the occurence these messages and the time of crashing.
>How-To-Repeat:
There appears (as yet) to be no link between the moment of crashing and external (network) or internal (e.g. cron) activity. 
>Fix:

>Release-Note:
>Audit-Trail:

From: Rudi Mathijssen <R.Mathijssen@iris-ict.nl>
To: "'freebsd-gnats-submit@FreeBSD.org'" <freebsd-gnats-submit@FreeBSD.org>
Cc: "'murray@stokely.org'" <murray@stokely.org>,
	"'kruse@kruse-ict.nl'" <kruse@kruse-ict.nl>
Subject: Re: kern/28844: Router/nameserver system crashes 2-3 times monthl
	y
Date: Sun, 4 Nov 2001 10:59:37 +0100 

 This message is in MIME format. Since your mail reader does not understand
 this format, some or all of this message may not be legible.
 
 ------_=_NextPart_001_01C16517.6D3BD358
 Content-Type: text/plain;
 	charset="iso-8859-1"
 
 Modifications tested: (1) removed xl and (suspect) fxp, added more SMC
 cards, now we have six interfaces 
 de0-de5. These all run half-duplex 100Mbps. (2) Furthermore, as netstat -m
 showed that the peak use of 
 mbuf clusters (944) came awfully close to 1024 (default), NMBCLUSTERS=4096
 was set. After a flawless 
 operation  from 4-sep-2001 on, it crashed again on oct-29 and oct-31 (the
 panic message is: page fault in kernel mode). This is not acceptable. Should
 we upgrade to 4.4? Go back to 4.0? Is there a special kernel param NO_PANIC
 which should be set to 1? 
 I stress, this is not a test lab, it's a production environment. If FreeBSD
 is not suitable for this, please tell me.
 
 Rudi Mathijssen
 
 ------_=_NextPart_001_01C16517.6D3BD358
 Content-Type: text/html;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
 <HTML>
 <HEAD>
 <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
 charset=3Diso-8859-1">
 <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
 5.5.2650.12">
 <TITLE>Re: kern/28844: Router/nameserver system crashes 2-3 times =
 monthly</TITLE>
 </HEAD>
 <BODY>
 
 <P><FONT SIZE=3D2 FACE=3D"Arial">Modifications tested: (1) removed xl =
 and (suspect) fxp, added more SMC cards, now we have six interfaces =
 </FONT>
 <BR><FONT SIZE=3D2 FACE=3D"Arial">de0-de5. These all run half-duplex =
 100Mbps. (2) Furthermore, as netstat -m showed that the peak use of =
 </FONT>
 <BR><FONT SIZE=3D2 FACE=3D"Arial">mbuf clusters (944) came awfully =
 close to 1024 (default), NMBCLUSTERS=3D4096 was set. After a flawless =
 </FONT>
 <BR><FONT SIZE=3D2 FACE=3D"Arial">operation&nbsp; from 4-sep-2001 on, =
 it cr</FONT><FONT SIZE=3D2 FACE=3D"Arial">ashed again on oct-29 and =
 oct-31 (t</FONT><FONT SIZE=3D2 FACE=3D"Arial">he panic message is: page =
 fault in kernel mode</FONT><FONT SIZE=3D2 FACE=3D"Arial">). This is not =
 acceptable. Should we upgrade to 4.4? Go back to 4.0? Is there a =
 special kernel param NO_PANIC which should be set to 1? </FONT></P>
 
 <P><FONT SIZE=3D2 FACE=3D"Arial">I stress, this is not a test lab, it's =
 a production environment. If FreeBSD is not suitable for this, please =
 tell me.</FONT>
 </P>
 
 <P><FONT SIZE=3D2 FACE=3D"Arial">Rudi Mathijssen</FONT>
 </P>
 
 </BODY>
 </HTML>
 ------_=_NextPart_001_01C16517.6D3BD358--

From: Murray Stokely <murray@FreeBSD.org>
To: Rudi Mathijssen <R.Mathijssen@iris-ict.nl>
Cc: "'freebsd-gnats-submit@FreeBSD.org'" <freebsd-gnats-submit@FreeBSD.org>,
	"'kruse@kruse-ict.nl'" <kruse@kruse-ict.nl>
Subject: Re: kern/28844: Router/nameserver system crashes 2-3 times monthl y
Date: Mon, 5 Nov 2001 04:31:46 -0800

 On Sun, Nov 04, 2001 at 10:59:37AM +0100, Rudi Mathijssen wrote:
 > Modifications tested: (1) removed xl and (suspect) fxp, added more SMC
 > cards, now we have six interfaces 
 > de0-de5. These all run half-duplex 100Mbps. (2) Furthermore, as netstat -m
 > showed that the peak use of 
 > mbuf clusters (944) came awfully close to 1024 (default), NMBCLUSTERS=4096
 > was set. After a flawless 
 > operation  from 4-sep-2001 on, it crashed again on oct-29 and oct-31 (the
 > panic message is: page fault in kernel mode). This is not acceptable. Should
 > we upgrade to 4.4? Go back to 4.0? Is there a special kernel param NO_PANIC
 > which should be set to 1? 
 > I stress, this is not a test lab, it's a production environment. If FreeBSD
 > is not suitable for this, please tell me.
 
   Can you generate a backtrace of the kernel crash and post it to
 freebsd-stable@FreeBSD.org or freebsd-net@FreeBSD.org?
 
   This document should help narrow down the cause of the failure :
 
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html#AEN4392
 
   What version of FreeBSD are you running?  Does this problem still
 exist with 4.4 or 4.4-STABLE?
 
       - Murray
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Wed Nov 21 10:11:51 PST 2001 
State-Changed-Why:  

Based on the backtrace sent to -net, this is almost certainly caused 
by the icmp_error bug that was fixed just before 4.3-RELEASE. Can 
you confirm that this problem goes away if you upgrade to a more 
recent release? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=28844 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Jan 13 10:10:28 PST 2002 
State-Changed-Why:  

Feedback timeout, and almost certainly fixed. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=28844 
>Unformatted:
