From nobody@FreeBSD.org  Mon Jan  7 00:46:37 2002
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 9634C37B41B
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  7 Jan 2002 00:46:36 -0800 (PST)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.6/8.11.6) id g078kao61432;
	Mon, 7 Jan 2002 00:46:36 -0800 (PST)
	(envelope-from nobody)
Message-Id: <200201070846.g078kao61432@freefall.freebsd.org>
Date: Mon, 7 Jan 2002 00:46:36 -0800 (PST)
From: Karsten Thygesen <kay@sonofon.dk>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Panic: vm_page_unwire: invalid wire count: 0
X-Send-Pr-Version: www-1.0

>Number:         33637
>Category:       kern
>Synopsis:       Panic: vm_page_unwire: invalid wire count: 0
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jan 07 00:50:01 PST 2002
>Closed-Date:    Sun Jul 14 17:58:45 PDT 2002
>Last-Modified:  Sun Jul 14 17:58:45 PDT 2002
>Originator:     Karsten Thygesen
>Release:        4.5-prerelease (CVS pr. 2001-12-25)
>Organization:
Sonofon
>Environment:
FreeBSD abnew01.sonofon.dk 4.5-PRERELEASE FreeBSD 4.5-PRERELEASE #6: Wed Dec 26 00:58:04 CET 2001     root@abnew01.sonofon.dk:/usr/obj/usr/src/sys/ABNEW01  i386
>Description:
Server crashes after 3-7 days of uptime. It's a 4 CPU Compaq Proliant server with 3GB memory and 1.8Tb scsi disks. It's running as (diablo) newsserver and is medium loaded. The error message is:

panic: vm_page_unwire: invalid wire count: 0
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000; 
boot() called on cpu#1

syncing disks... 234 36 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
giving up on 2 buffers
Uptime: 3d23h57m20s
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset called on cpu#1
cpu_reset: Stopping other CPUs
cpu_reset: Restarting BSP
cpu_reset_proxy: Grabbed mp lock for BSP
cpu_reset_proxy: Stopped CPU 1
>How-To-Repeat:
It have happened 3 times now
>Fix:
None known
>Release-Note:
>Audit-Trail:

From: "Ted Mittelstaedt" <tedm@toybox.placo.com>
To: <freebsd-gnats-submit@FreeBSD.org>, <kay@sonofon.dk>
Cc:  
Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 01:09:01 -0800

 What is the history of this system?  Has it run prior versions of
 FreeBSD without problems?
 
 Does this problem happen with a uniprocessor kernel?
 
 
 Ted Mittelstaedt                                       tedm@toybox.placo.com
 

From: Karsten Thygesen <KAY@sonofon.dk>
To: 'Ted Mittelstaedt' <tedm@toybox.placo.com>,
	freebsd-gnats-submit@FreeBSD.org, Karsten Thygesen <KAY@sonofon.dk>
Cc:  
Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 10:58:09 +0100 

 Hi
 
 The server is a news server in production. It was running INN as newsserver
 software for more than 6 months using FreeBSD-4.3-stable. Then I started to
 roll in diablo (also newsserver software) on the same server and then I
 started to see crashes with the same error message - daily!. I then updated
 to the latest 4.5 and the system was more stable again. I shut down INN
 completly and migrated 100% to diablo and now the system is running 3-7 days
 between crashes.
 
 I have not tried a uniprocesser kernel and as this is a production system,
 it's not that easy to try - further, I fear that a single cpu is enough for
 the current load.
 
 Karsten
 
 -----Original Message-----
 From: Ted Mittelstaedt [mailto:tedm@toybox.placo.com]
 Sent: Wednesday, January 09, 2002 10:09 AM
 To: freebsd-gnats-submit@FreeBSD.org; kay@sonofon.dk
 Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
 
 
 What is the history of this system?  Has it run prior versions of
 FreeBSD without problems?
 
 Does this problem happen with a uniprocessor kernel?
 
 
 Ted Mittelstaedt                                       tedm@toybox.placo.com

From: "Ted Mittelstaedt" <tedm@toybox.placo.com>
To: "Karsten Thygesen" <KAY@sonofon.dk>,
	<freebsd-gnats-submit@FreeBSD.org>
Cc:  
Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 02:38:47 -0800

 In summary:
 
 The server ran fine for 6 months using INN.
 
 The server is now broken running Diablo.
 
 Why is this a FreeBSD problem?  If it was fine without Diablo, and crashes
 with
 Diablo, then the problem is Diablo!!!!
 
 I'd recommend that this PR be suspended until such time that the Diablo
 developers
 have had a chance to respond to this, and explain why the problem is FreeBSD
 when
 the FreeBSD server only started crashing after Diablo was run on it.
 
 Ted Mittelstaedt                                       tedm@toybox.placo.com
 

From: Peter Pentchev <roam@ringlet.net>
To: Ted Mittelstaedt <tedm@toybox.placo.com>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 12:45:37 +0200

 On Wed, Jan 09, 2002 at 02:40:02AM -0800, Ted Mittelstaedt wrote:
 >
 >  In summary:
 >  
 >  The server ran fine for 6 months using INN.
 >  
 >  The server is now broken running Diablo.
 >  
 >  Why is this a FreeBSD problem?  If it was fine without Diablo, and crashes
 >  with
 >  Diablo, then the problem is Diablo!!!!
 >  
 >  I'd recommend that this PR be suspended until such time that the Diablo
 >  developers
 >  have had a chance to respond to this, and explain why the problem is FreeBSD
 >  when
 >  the FreeBSD server only started crashing after Diablo was run on it.
 
 An application should not cause a kernel panic if it only uses
 the system calls documented in section 2 or the library functions
 documented in section 3 of the manual.  I highly doubt that the Diablo
 developers are meddling with kernel structures directly, therefore
 it is indeed a FreeBSD problem if a kernel panic occurs.
 
 G'luck,
 Peter
 
 -- 
 What would this sentence be like if pi were 3?

From: Karsten Thygesen <KAY@sonofon.dk>
To: 'Ted Mittelstaedt' <tedm@toybox.placo.com>,
	Karsten Thygesen <KAY@sonofon.dk>, freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 12:37:32 +0100 

 It is a freebsd problem as diablo runs as an ordinary user without special
 privileges. No user program should be able to triger a kernel fault unless
 it is the kernels fault, right?
 
 You can not blame this on diablo....
 
 Karsten
 
 
 -----Original Message-----
 From: Ted Mittelstaedt [mailto:tedm@toybox.placo.com]
 Sent: Wednesday, January 09, 2002 11:39 AM
 To: Karsten Thygesen; freebsd-gnats-submit@FreeBSD.org
 Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
 
 
 
 In summary:
 
 The server ran fine for 6 months using INN.
 
 The server is now broken running Diablo.
 
 Why is this a FreeBSD problem?  If it was fine without Diablo, and crashes
 with
 Diablo, then the problem is Diablo!!!!
 
 I'd recommend that this PR be suspended until such time that the Diablo
 developers
 have had a chance to respond to this, and explain why the problem is FreeBSD
 when
 the FreeBSD server only started crashing after Diablo was run on it.
 
 Ted Mittelstaedt                                       tedm@toybox.placo.com

From: "Ted Mittelstaedt" <tedm@toybox.placo.com>
To: "Peter Pentchev" <roam@ringlet.net>
Cc: <freebsd-gnats-submit@FreeBSD.org>
Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Wed, 9 Jan 2002 04:22:39 -0800

 Any application is able to cause a kernel panic with just regular library
 calls.
 Of course they shouldn't do it, but they can if they want.  This is one reason
 login.conf exists.
 
 The Diablo support forum is a more appropriate place to start your
 troubleshooting.
 Matt Dillon (originator of Diablo) has done a lot of work in the FreeBSD
 virtual memory system and Diablo was originally developed on FreeBSD.
 
 Please, please, don't try to do an end-run around the Diablo support and
 development team, they really are your best resource for getting this
 fixed in a timely manner!
 
 
 Ted Mittelstaedt                                       tedm@toybox.placo.com
 
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: sheldonh 
Responsible-Changed-When: Wed Jan 9 05:53:49 PST 2002 
Responsible-Changed-Why:  
Matt knows a thing or two about diablo _and_ the FreeBSD VM 
subsystem. :-) 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=33637 
State-Changed-From-To: open->feedback 
State-Changed-By: sheldonh 
State-Changed-When: Wed Jan 9 06:03:44 PST 2002 
State-Changed-Why:  
Please follow the advice given at the following web page to provide 
more detail: 

http://www.freebsd.org/FAQ/advanced.html#KERNEL-PANIC-TROUBLESHOOTING 

Please copy your feedback to <bug-followup@freebsd.org>, using the 
subject line of this message. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=33637 

From: Peter Pentchev <roam@ringlet.net>
To: Ted Mittelstaedt <tedm@toybox.placo.com>
Cc: gnb@itga.com.au, bug-followup@FreeBSD.ORG
Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Fri, 11 Jan 2002 03:58:41 +0200

 On Fri, Jan 11, 2002 at 12:33:01AM -0800, Ted Mittelstaedt wrote:
 > >-----Original Message-----
 > >From: gnb@itga.com.au [mailto:gnb@itga.com.au]
 > >Sent: Thursday, January 10, 2002 2:42 PM
 > >To: Ted Mittelstaedt
 > >Cc: gnb@itga.com.au; freebsd-bugs@FreeBSD.ORG
 > >Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
 > >
 > >
 > >>  The submitter of the PR
 > >> was operating under a false assumption - that he could run any program and
 > >> assume that it was impossible for it to crash the OS.
 > >
 > >But I don't think that is a false assumtion.  On the contrary, that
 > >is exactly
 > >what I would expect and hope to find from a "production quality" OS like
 > >FreeBSD.
 > 
 > If you give a program life-and-death authority over the system and
 > the program crashes the system then that is hardly a bug in the system,
 > now is it?
 
 As Bill Fumerola said, running a process as root is not giving it
 life-or-death authority over the system.  In my message about
 section 2 and 3 of the manual, I stated my opinion that Diablo
 probably does not go around fudging kernel structures directly - THAT
 would be giving it life-or-death authority.  Part of the purpose
 of system and library calls is exactly to give the OS some opportunity
 to limit processes' ability to do damage by supplying incorrect data.
 
 > >
 > >IMO the submitter was being entirely reasonable in making that
 > >assumption - or
 > >at least, on finding a violation of that assumption, to report it
 > >and expect it
 > >to be treated as a bug.  (Even if the response is "we know it's a
 > >bug and it's
 > >hard to fix, here's a workaround using login.conf".)
 > >
 > 
 > But that WASN'T my response, re-read my response to the PR, I did
 > not tell him to fix his problem with login.conf.  I merely pointed him
 > to it because he stated:
 > 
 > "An application should not cause a kernel panic if it only uses
 >  the system calls documented in section 2 or the library functions
 >  documented in section 3 of the manual."
 > 
 > which is obviously incorrect, and if he read the manpage to login.conf
 > he would have realized this.
 
 And just for the record, this was not posted by the submitter,
 it was posted by myself; just BTW, I like to consider myself
 a FreeBSD Project member, albeit only a meager ports committer,
 which would once more indicate that your opinion is not really shared
 by all of the Project's members :)
 
 About the login.conf thing - yes, I know that a forkbomb or an excessive
 memory allocation can crash FreeBSD.  But - apparently unlike you -
 I consider that to be an OS bug.  If a process (or processes) should
 decide to go haywire, the OS may be allowed to go down on its knees,
 slow down to a crawl, but it should NOT panic.  Thus, I maintain my
 opinion that a userland process should not be able to panic the OS,
 and that, consequently, this PR points out a problem in FreeBSD
 that happens to be triggered by the Diablo code.
 
 G'luck,
 Peter
 
 -- 
 You have, of course, just begun reading the sentence that you have just finished reading.

From: "Ted Mittelstaedt" <tedm@toybox.placo.com>
To: "Peter Pentchev" <roam@ringlet.net>
Cc: <gnb@itga.com.au>, <bug-followup@FreeBSD.ORG>
Subject: RE: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Fri, 11 Jan 2002 09:26:31 -0800

 >-----Original Message-----
 >From: Peter Pentchev [mailto:roam@ringlet.net]
 >Sent: Thursday, January 10, 2002 5:59 PM
 >To: Ted Mittelstaedt
 
 
 >
 >About the login.conf thing - yes, I know that a forkbomb or an excessive
 >memory allocation can crash FreeBSD.  But - apparently unlike you -
 >I consider that to be an OS bug.  If a process (or processes) should
 >decide to go haywire, the OS may be allowed to go down on its knees,
 >slow down to a crawl, but it should NOT panic.
 
 I don't know that there's much practical difference to the user between the
 system panicing and the system slowing to a crawl - both make the system
 unusable.
 
 I guess I'd say that if your consistent you should be arguing that if a
 process goes haywire the system shouldn't panic, it should remain unaffected.
 
 I also agree that this should be a design goal of FreeBSD but I assume
 that perfection is impossible to achieve.  Therefore I allow that it's always
 going to be possible for an errant application program to crash the system.
 The difference between us is that I call that an application bug, you call
 that a kernel bug.
 
 >Thus, I maintain my
 >opinion that a userland process should not be able to panic the OS,
 >and that, consequently, this PR points out a problem in FreeBSD
 >that happens to be triggered by the Diablo code.
 >
 
 
 Of the various hypothesis I consider this to be the more likely although
 I think the trigger is a combination of things of which Diablo is the major
 part.  But there's no guarentee that fixing the FreeBSD code is going to get
 the user going again because if Diablo has a bug in it that is the trigger
 then
 Diablo is still going to have a bug in it which still may erupt.
 
 This is why one of my first suggestions was to try it with a uniprocessor
 kernel
 which if the user was willing to do (he wasn't, reread the PR) might be the
 quickest bandaid fix, because if the problem only showed up in SMP mode then
 it would get him a stable Diablo server immediately.  (It also would be useful
 info to
 the kernel developer)  The user also admitted he didn't know if SMP was a
 requirement or not in his application.  One of the cardinal rules of
 troubleshooting
 is to start by removing as much extraneous stuff as possible to break the
 system
 down into simple components and test them.
 
 Ted
 

From: Jung-uk Kim <jkim@niksun.com>
To: freebsd-gnats-submit@FreeBSD.org, kay@sonofon.dk
Cc:  
Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Tue, 16 Apr 2002 12:55:49 -0400

 This patch fixed my problem. Can you try this?
 
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sys_pipe.c.diff?r1=1.60.2.11&r2=1.60.2.12&only_with_tag=RELENG_4&f=h
 

From: Karsten Thygesen <kay@sonofon.dk>
To: freebsd-gnats-submit@FreeBSD.org, kay@sonofon.dk
Cc:  
Subject: Re: kern/33637: Panic: vm_page_unwire: invalid wire count: 0
Date: Fri, 31 May 2002 13:21:59 +0200

 Hi
 
 The patch solves the problem. Have been stable for 6 weeks now!
 
 Thanks a lot!
 
 Karsten
 
State-Changed-From-To: feedback->closed 
State-Changed-By: mp 
State-Changed-When: Sun Jul 14 17:57:45 PDT 2002 
State-Changed-Why:  
The originator says the bug has been fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=33637 
>Unformatted:
