From nobody@FreeBSD.org  Wed Oct 14 22:38:26 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 945E010656A3
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 14 Oct 2009 22:38:26 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 82FFA8FC31
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 14 Oct 2009 22:38:26 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n9EMcQwM072664
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 14 Oct 2009 22:38:26 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n9EMcPbd072459;
	Wed, 14 Oct 2009 22:38:25 GMT
	(envelope-from nobody)
Message-Id: <200910142238.n9EMcPbd072459@www.freebsd.org>
Date: Wed, 14 Oct 2009 22:38:25 GMT
From: Andrew Brampton <brampton+freebsd@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Minidump fail when many interrupts fire
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         139614
>Category:       amd64
>Synopsis:       [minidump] minidumps fail when many interrupts fire
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    avg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Oct 14 22:40:02 UTC 2009
>Closed-Date:    Mon Jul 16 11:13:51 UTC 2012
>Last-Modified:  Mon Jul 16 11:13:51 UTC 2012
>Originator:     Andrew Brampton
>Release:        FreeBSD 7 and FreeBSD 8
>Organization:
>Environment:
>Description:
There have been at least two discussions on the FreeBSD mailing lists over
the past couple of years about minidumps failing due to interrupts being
enabled. I couldn't find an existing PR, so to track this bug I'm creating one.

The problem is summed up by Ruslan Ermilov:
"Kernel minidumps on amd64 SMP can write beyond the bounds
of the configured dump device causing (as in our case) the
file system data following swap partition to be overwritten
with the dump contents.

The problem is that while we're in the process of dumping
mapped physical pages via a bitmap (in minidump_machdep.c),
other CPUs continue to work and may modify page mappings of
processes.  This in turn causes the modifications to
pv_entries, which in turn modifies the bitmap of pages to
dump.  As the result, we can dump more pages than we've
calculated, and since dumps are written to the end of the
dump device, we may end up overwriting it.

The attached patch mitigates the problem, but the real solution
seems to be to disable interrupts (there's an XXX about this
in kern_shutdown.c before calling doadump()), and stopping
other CPUs, so we don't modify page tables while we're dumping."[1]

This problem does not seem to be avoided by expanding your swap space[1], and it seems to hit those with interrupt heavy workloads, such as servers with lots of network traffic[2].

Hopefully someone will be able to find a suitable fix.

thanks

[1] http://lists.freebsd.org/pipermail/freebsd-current/2008-January/082752.html
[2] http://lists.freebsd.org/pipermail/freebsd-current/2008-June/086574.html
[3] http://lists.freebsd.org/pipermail/freebsd-current/2009-August/010599.html


>How-To-Repeat:
Panic your kernel and ensure a device is generating lots of interrupts,
for example, a network card with packets being sent to it.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-amd64->avg 
Responsible-Changed-By: avg 
Responsible-Changed-When: Sun Dec 5 12:46:53 UTC 2010 
Responsible-Changed-Why:  
I am working on the related issues. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=139614 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: amd64/139614: commit references a PR
Date: Sun, 11 Dec 2011 21:02:14 +0000 (UTC)

 Author: avg
 Date: Sun Dec 11 21:02:01 2011
 New Revision: 228424
 URL: http://svn.freebsd.org/changeset/base/228424
 
 Log:
   panic: add a switch and infrastructure for stopping other CPUs in SMP case
   
   Historical behavior of letting other CPUs merily go on is a default for
   time being.  The new behavior can be switched on via
   kern.stop_scheduler_on_panic tunable and sysctl.
   
   Stopping of the CPUs has (at least) the following benefits:
   - more of the system state at panic time is preserved intact
   - threads and interrupts do not interfere with dumping of the system
     state
   
   Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
   is set.  That thread might call code that is also used in normal context
   and that code might use locks to prevent concurrent execution of certain
   parts.  Those locks might be held by the stopped threads and would never
   be released.  To work around this issue, it was decided that instead of
   explicit checks for panic context, we would rather put those checks
   inside the locking primitives.
   
   This change has substantial portions written and re-written by attilio
   and kib at various times.  Other changes are heavily based on the ideas
   and patches submitted by jhb and mdf.  bde has provided many insights
   into the details and history of the current code.
   
   The new behavior may cause problems for systems that use a USB keyboard
   for interfacing with system console.  This is because of some unusual
   locking patterns in the ukbd code which have to be used because on one
   hand ukbd is below syscons, but on the other hand it has to interface
   with other usb code that uses regular mutexes/Giant for its concurrency
   protection.  Dumping to USB-connected disks may also be affected.
   
   PR:			amd64/139614 (at least)
   In cooperation with:	attilio, jhb, kib, mdf
   Discussed with:		arch@, bde
   Tested by:		Eugene Grosbein <eugen@grosbein.net>,
   			gnn,
   			Steven Hartland <killing@multiplay.co.uk>,
   			glebius,
   			Andrew Boyer <aboyer@averesystems.com>
   			(various versions of the patch)
   MFC after:		3 months (or never)
 
 Modified:
   head/sys/kern/kern_lock.c
   head/sys/kern/kern_mutex.c
   head/sys/kern/kern_rmlock.c
   head/sys/kern/kern_rwlock.c
   head/sys/kern/kern_shutdown.c
   head/sys/kern/kern_sx.c
   head/sys/kern/kern_synch.c
   head/sys/kern/subr_kdb.c
   head/sys/kern/subr_lock.c
   head/sys/kern/subr_witness.c
   head/sys/sys/lockstat.h
   head/sys/sys/mutex.h
   head/sys/sys/systm.h
 
 Modified: head/sys/kern/kern_lock.c
 ==============================================================================
 --- head/sys/kern/kern_lock.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_lock.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -1232,6 +1232,9 @@ _lockmgr_disown(struct lock *lk, const c
  {
  	uintptr_t tid, x;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	tid = (uintptr_t)curthread;
  	_lockmgr_assert(lk, KA_XLOCKED | KA_NOTRECURSED, file, line);
  
 
 Modified: head/sys/kern/kern_mutex.c
 ==============================================================================
 --- head/sys/kern/kern_mutex.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_mutex.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -192,6 +192,8 @@ void
  _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(m->mtx_lock != MTX_DESTROYED,
  	    ("mtx_lock() of destroyed mutex @ %s:%d", file, line));
 @@ -211,6 +213,9 @@ _mtx_lock_flags(struct mtx *m, int opts,
  void
  _mtx_unlock_flags(struct mtx *m, int opts, const char *file, int line)
  {
 +
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(m->mtx_lock != MTX_DESTROYED,
  	    ("mtx_unlock() of destroyed mutex @ %s:%d", file, line));
 @@ -232,6 +237,8 @@ void
  _mtx_lock_spin_flags(struct mtx *m, int opts, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(m->mtx_lock != MTX_DESTROYED,
  	    ("mtx_lock_spin() of destroyed mutex @ %s:%d", file, line));
 @@ -254,6 +261,8 @@ void
  _mtx_unlock_spin_flags(struct mtx *m, int opts, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(m->mtx_lock != MTX_DESTROYED,
  	    ("mtx_unlock_spin() of destroyed mutex @ %s:%d", file, line));
 @@ -282,6 +291,9 @@ mtx_trylock_flags_(struct mtx *m, int op
  #endif
  	int rval;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	MPASS(curthread != NULL);
  	KASSERT(m->mtx_lock != MTX_DESTROYED,
  	    ("mtx_trylock() of destroyed mutex @ %s:%d", file, line));
 @@ -338,6 +350,9 @@ _mtx_lock_sleep(struct mtx *m, uintptr_t
  	int64_t sleep_time = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (mtx_owned(m)) {
  		KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0,
  	    ("_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n",
 @@ -508,6 +523,9 @@ _mtx_lock_spin(struct mtx *m, uintptr_t 
  	uint64_t waittime = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (LOCK_LOG_TEST(&m->lock_object, opts))
  		CTR1(KTR_LOCK, "_mtx_lock_spin: %p spinning", m);
  
 @@ -555,6 +573,10 @@ thread_lock_flags_(struct thread *td, in
  
  	i = 0;
  	tid = (uintptr_t)curthread;
 +
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	for (;;) {
  retry:
  		spinlock_enter();
 @@ -656,6 +678,9 @@ _mtx_unlock_sleep(struct mtx *m, int opt
  {
  	struct turnstile *ts;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (mtx_recursed(m)) {
  		if (--(m->mtx_recurse) == 0)
  			atomic_clear_ptr(&m->mtx_lock, MTX_RECURSED);
 
 Modified: head/sys/kern/kern_rmlock.c
 ==============================================================================
 --- head/sys/kern/kern_rmlock.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_rmlock.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -344,6 +344,9 @@ _rm_rlock(struct rmlock *rm, struct rm_p
  	struct thread *td = curthread;
  	struct pcpu *pc;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	tracker->rmp_flags  = 0;
  	tracker->rmp_thread = td;
  	tracker->rmp_rmlock = rm;
 @@ -413,6 +416,9 @@ _rm_runlock(struct rmlock *rm, struct rm
  	struct pcpu *pc;
  	struct thread *td = tracker->rmp_thread;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	td->td_critnest++;	/* critical_enter(); */
  	pc = cpuid_to_pcpu[td->td_oncpu]; /* pcpu_find(td->td_oncpu); */
  	rm_tracker_remove(pc, tracker);
 @@ -432,6 +438,9 @@ _rm_wlock(struct rmlock *rm)
  	struct turnstile *ts;
  	cpuset_t readcpus;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (rm->lock_object.lo_flags & RM_SLEEPABLE)
  		sx_xlock(&rm->rm_lock_sx);
  	else
 @@ -486,6 +495,9 @@ _rm_wunlock(struct rmlock *rm)
  void _rm_wlock_debug(struct rmlock *rm, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	WITNESS_CHECKORDER(&rm->lock_object, LOP_NEWORDER | LOP_EXCLUSIVE,
  	    file, line, NULL);
  
 @@ -507,6 +519,9 @@ void
  _rm_wunlock_debug(struct rmlock *rm, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	curthread->td_locks--;
  	if (rm->lock_object.lo_flags & RM_SLEEPABLE)
  		WITNESS_UNLOCK(&rm->rm_lock_sx.lock_object, LOP_EXCLUSIVE,
 @@ -521,6 +536,10 @@ int
  _rm_rlock_debug(struct rmlock *rm, struct rm_priotracker *tracker,
      int trylock, const char *file, int line)
  {
 +
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	if (!trylock && (rm->lock_object.lo_flags & RM_SLEEPABLE))
  		WITNESS_CHECKORDER(&rm->rm_lock_sx.lock_object, LOP_NEWORDER,
  		    file, line, NULL);
 @@ -544,6 +563,9 @@ _rm_runlock_debug(struct rmlock *rm, str
      const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	curthread->td_locks--;
  	WITNESS_UNLOCK(&rm->lock_object, 0, file, line);
  	LOCK_LOG_LOCK("RMRUNLOCK", &rm->lock_object, 0, 0, file, line);
 
 Modified: head/sys/kern/kern_rwlock.c
 ==============================================================================
 --- head/sys/kern/kern_rwlock.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_rwlock.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -233,6 +233,8 @@ void
  _rw_wlock(struct rwlock *rw, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_wlock() of destroyed rwlock @ %s:%d", file, line));
 @@ -249,6 +251,9 @@ _rw_try_wlock(struct rwlock *rw, const c
  {
  	int rval;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_try_wlock() of destroyed rwlock @ %s:%d", file, line));
  
 @@ -273,6 +278,8 @@ void
  _rw_wunlock(struct rwlock *rw, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_wunlock() of destroyed rwlock @ %s:%d", file, line));
 @@ -317,6 +324,9 @@ _rw_rlock(struct rwlock *rw, const char 
  	int64_t sleep_time = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_rlock() of destroyed rwlock @ %s:%d", file, line));
  	KASSERT(rw_wowner(rw) != curthread,
 @@ -499,6 +509,9 @@ _rw_try_rlock(struct rwlock *rw, const c
  {
  	uintptr_t x;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	for (;;) {
  		x = rw->rw_lock;
  		KASSERT(rw->rw_lock != RW_DESTROYED,
 @@ -525,6 +538,9 @@ _rw_runlock(struct rwlock *rw, const cha
  	struct turnstile *ts;
  	uintptr_t x, v, queue;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_runlock() of destroyed rwlock @ %s:%d", file, line));
  	_rw_assert(rw, RA_RLOCKED, file, line);
 @@ -650,6 +666,9 @@ _rw_wlock_hard(struct rwlock *rw, uintpt
  	int64_t sleep_time = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (rw_wlocked(rw)) {
  		KASSERT(rw->lock_object.lo_flags & LO_RECURSABLE,
  		    ("%s: recursing but non-recursive rw %s @ %s:%d\n",
 @@ -814,6 +833,9 @@ _rw_wunlock_hard(struct rwlock *rw, uint
  	uintptr_t v;
  	int queue;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	if (rw_wlocked(rw) && rw_recursed(rw)) {
  		rw->rw_recurse--;
  		if (LOCK_LOG_TEST(&rw->lock_object, 0))
 @@ -876,6 +898,9 @@ _rw_try_upgrade(struct rwlock *rw, const
  	struct turnstile *ts;
  	int success;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_try_upgrade() of destroyed rwlock @ %s:%d", file, line));
  	_rw_assert(rw, RA_RLOCKED, file, line);
 @@ -946,6 +971,9 @@ _rw_downgrade(struct rwlock *rw, const c
  	uintptr_t tid, v;
  	int rwait, wwait;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	KASSERT(rw->rw_lock != RW_DESTROYED,
  	    ("rw_downgrade() of destroyed rwlock @ %s:%d", file, line));
  	_rw_assert(rw, RA_WLOCKED | RA_NOTRECURSED, file, line);
 
 Modified: head/sys/kern/kern_shutdown.c
 ==============================================================================
 --- head/sys/kern/kern_shutdown.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_shutdown.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -121,6 +121,11 @@ SYSCTL_INT(_kern, OID_AUTO, sync_on_pani
  	&sync_on_panic, 0, "Do a sync before rebooting from a panic");
  TUNABLE_INT("kern.sync_on_panic", &sync_on_panic);
  
 +static int stop_scheduler_on_panic = 0;
 +SYSCTL_INT(_kern, OID_AUTO, stop_scheduler_on_panic, CTLFLAG_RW | CTLFLAG_TUN,
 +    &stop_scheduler_on_panic, 0, "stop scheduler upon entering panic");
 +TUNABLE_INT("kern.stop_scheduler_on_panic", &stop_scheduler_on_panic);
 +
  static SYSCTL_NODE(_kern, OID_AUTO, shutdown, CTLFLAG_RW, 0,
      "Shutdown environment");
  
 @@ -138,6 +143,7 @@ SYSCTL_INT(_kern_shutdown, OID_AUTO, sho
   */
  const char *panicstr;
  
 +int stop_scheduler;			/* system stopped CPUs for panic */
  int dumping;				/* system is dumping */
  int rebooting;				/* system is rebooting */
  static struct dumperinfo dumper;	/* our selected dumper */
 @@ -294,10 +300,12 @@ kern_reboot(int howto)
  	 * systems don't shutdown properly (i.e., ACPI power off) if we
  	 * run on another processor.
  	 */
 -	thread_lock(curthread);
 -	sched_bind(curthread, 0);
 -	thread_unlock(curthread);
 -	KASSERT(PCPU_GET(cpuid) == 0, ("%s: not running on cpu 0", __func__));
 +	if (!SCHEDULER_STOPPED()) {
 +		thread_lock(curthread);
 +		sched_bind(curthread, 0);
 +		thread_unlock(curthread);
 +		KASSERT(PCPU_GET(cpuid) == 0, ("boot: not running on cpu 0"));
 +	}
  #endif
  	/* We're in the process of rebooting. */
  	rebooting = 1;
 @@ -547,13 +555,18 @@ panic(const char *fmt, ...)
  {
  #ifdef SMP
  	static volatile u_int panic_cpu = NOCPU;
 +	cpuset_t other_cpus;
  #endif
  	struct thread *td = curthread;
  	int bootopt, newpanic;
  	va_list ap;
  	static char buf[256];
  
 -	critical_enter();
 +	if (stop_scheduler_on_panic)
 +		spinlock_enter();
 +	else
 +		critical_enter();
 +
  #ifdef SMP
  	/*
  	 * We don't want multiple CPU's to panic at the same time, so we
 @@ -566,6 +579,22 @@ panic(const char *fmt, ...)
  		    PCPU_GET(cpuid)) == 0)
  			while (panic_cpu != NOCPU)
  				; /* nothing */
 +
 +	if (stop_scheduler_on_panic) {
 +		if (panicstr == NULL && !kdb_active) {
 +			other_cpus = all_cpus;
 +			CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 +			stop_cpus_hard(other_cpus);
 +		}
 +
 +		/*
 +		 * We set stop_scheduler here and not in the block above,
 +		 * because we want to ensure that if panic has been called and
 +		 * stop_scheduler_on_panic is true, then stop_scheduler will
 +		 * always be set.  Even if panic has been entered from kdb.
 +		 */
 +		stop_scheduler = 1;
 +	}
  #endif
  
  	bootopt = RB_AUTOBOOT;
 @@ -604,7 +633,8 @@ panic(const char *fmt, ...)
  	/* thread_unlock(td); */
  	if (!sync_on_panic)
  		bootopt |= RB_NOSYNC;
 -	critical_exit();
 +	if (!stop_scheduler_on_panic)
 +		critical_exit();
  	kern_reboot(bootopt);
  }
  
 
 Modified: head/sys/kern/kern_sx.c
 ==============================================================================
 --- head/sys/kern/kern_sx.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_sx.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -241,6 +241,8 @@ _sx_slock(struct sx *sx, int opts, const
  {
  	int error = 0;
  
 +	if (SCHEDULER_STOPPED())
 +		return (0);
  	MPASS(curthread != NULL);
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_slock() of destroyed sx @ %s:%d", file, line));
 @@ -260,6 +262,9 @@ sx_try_slock_(struct sx *sx, const char 
  {
  	uintptr_t x;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	for (;;) {
  		x = sx->sx_lock;
  		KASSERT(x != SX_LOCK_DESTROYED,
 @@ -283,6 +288,8 @@ _sx_xlock(struct sx *sx, int opts, const
  {
  	int error = 0;
  
 +	if (SCHEDULER_STOPPED())
 +		return (0);
  	MPASS(curthread != NULL);
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_xlock() of destroyed sx @ %s:%d", file, line));
 @@ -304,6 +311,9 @@ sx_try_xlock_(struct sx *sx, const char 
  {
  	int rval;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	MPASS(curthread != NULL);
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_try_xlock() of destroyed sx @ %s:%d", file, line));
 @@ -330,6 +340,8 @@ void
  _sx_sunlock(struct sx *sx, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_sunlock() of destroyed sx @ %s:%d", file, line));
 @@ -345,6 +357,8 @@ void
  _sx_xunlock(struct sx *sx, const char *file, int line)
  {
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	MPASS(curthread != NULL);
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_xunlock() of destroyed sx @ %s:%d", file, line));
 @@ -369,6 +383,9 @@ sx_try_upgrade_(struct sx *sx, const cha
  	uintptr_t x;
  	int success;
  
 +	if (SCHEDULER_STOPPED())
 +		return (1);
 +
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_try_upgrade() of destroyed sx @ %s:%d", file, line));
  	_sx_assert(sx, SA_SLOCKED, file, line);
 @@ -399,6 +416,9 @@ sx_downgrade_(struct sx *sx, const char 
  	uintptr_t x;
  	int wakeup_swapper;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	KASSERT(sx->sx_lock != SX_LOCK_DESTROYED,
  	    ("sx_downgrade() of destroyed sx @ %s:%d", file, line));
  	_sx_assert(sx, SA_XLOCKED | SA_NOTRECURSED, file, line);
 @@ -481,6 +501,9 @@ _sx_xlock_hard(struct sx *sx, uintptr_t 
  	int64_t sleep_time = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return (0);
 +
  	/* If we already hold an exclusive lock, then recurse. */
  	if (sx_xlocked(sx)) {
  		KASSERT((sx->lock_object.lo_flags & LO_RECURSABLE) != 0,
 @@ -681,6 +704,9 @@ _sx_xunlock_hard(struct sx *sx, uintptr_
  	uintptr_t x;
  	int queue, wakeup_swapper;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	MPASS(!(sx->sx_lock & SX_LOCK_SHARED));
  
  	/* If the lock is recursed, then unrecurse one level. */
 @@ -753,6 +779,9 @@ _sx_slock_hard(struct sx *sx, int opts, 
  	int64_t sleep_time = 0;
  #endif
  
 +	if (SCHEDULER_STOPPED())
 +		return (0);
 +
  	/*
  	 * As with rwlocks, we don't make any attempt to try to block
  	 * shared locks once there is an exclusive waiter.
 @@ -919,6 +948,9 @@ _sx_sunlock_hard(struct sx *sx, const ch
  	uintptr_t x;
  	int wakeup_swapper;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	for (;;) {
  		x = sx->sx_lock;
  
 
 Modified: head/sys/kern/kern_synch.c
 ==============================================================================
 --- head/sys/kern/kern_synch.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/kern_synch.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -158,7 +158,7 @@ _sleep(void *ident, struct lock_object *
  	else
  		class = NULL;
  
 -	if (cold) {
 +	if (cold || SCHEDULER_STOPPED()) {
  		/*
  		 * During autoconfiguration, just return;
  		 * don't run any other threads or panic below,
 @@ -260,7 +260,7 @@ msleep_spin(void *ident, struct mtx *mtx
  	KASSERT(p != NULL, ("msleep1"));
  	KASSERT(ident != NULL && TD_IS_RUNNING(td), ("msleep"));
  
 -	if (cold) {
 +	if (cold || SCHEDULER_STOPPED()) {
  		/*
  		 * During autoconfiguration, just return;
  		 * don't run any other threads or panic below,
 @@ -429,6 +429,8 @@ mi_switch(int flags, struct thread *newt
  	 */
  	if (kdb_active)
  		kdb_switch();
 +	if (SCHEDULER_STOPPED())
 +		return;
  	if (flags & SW_VOL) {
  		td->td_ru.ru_nvcsw++;
  		td->td_swvoltick = ticks;
 
 Modified: head/sys/kern/subr_kdb.c
 ==============================================================================
 --- head/sys/kern/subr_kdb.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/subr_kdb.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -226,13 +226,7 @@ kdb_sysctl_trap_code(SYSCTL_HANDLER_ARGS
  void
  kdb_panic(const char *msg)
  {
 -#ifdef SMP
 -	cpuset_t other_cpus;
  
 -	other_cpus = all_cpus;
 -	CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 -	stop_cpus_hard(other_cpus);
 -#endif
  	printf("KDB: panic\n");
  	panic("%s", msg);
  }
 @@ -594,6 +588,9 @@ kdb_trap(int type, int code, struct trap
  	struct kdb_dbbe *be;
  	register_t intr;
  	int handled;
 +#ifdef SMP
 +	int did_stop_cpus;
 +#endif
  
  	be = kdb_dbbe;
  	if (be == NULL || be->dbbe_trap == NULL)
 @@ -606,9 +603,13 @@ kdb_trap(int type, int code, struct trap
  	intr = intr_disable();
  
  #ifdef SMP
 -	other_cpus = all_cpus;
 -	CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 -	stop_cpus_hard(other_cpus);
 +	if (!SCHEDULER_STOPPED()) {
 +		other_cpus = all_cpus;
 +		CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 +		stop_cpus_hard(other_cpus);
 +		did_stop_cpus = 1;
 +	} else
 +		did_stop_cpus = 0;
  #endif
  
  	kdb_active++;
 @@ -634,7 +635,8 @@ kdb_trap(int type, int code, struct trap
  	kdb_active--;
  
  #ifdef SMP
 -	restart_cpus(stopped_cpus);
 +	if (did_stop_cpus)
 +		restart_cpus(stopped_cpus);
  #endif
  
  	intr_restore(intr);
 
 Modified: head/sys/kern/subr_lock.c
 ==============================================================================
 --- head/sys/kern/subr_lock.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/subr_lock.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -532,6 +532,9 @@ lock_profile_obtain_lock_success(struct 
  	struct lock_profile_object *l;
  	int spin;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
 +
  	/* don't reset the timer when/if recursing */
  	if (!lock_prof_enable || (lo->lo_flags & LO_NOPROFILE))
  		return;
 @@ -596,6 +599,8 @@ lock_profile_release_lock(struct lock_ob
  	struct lpohead *head;
  	int spin;
  
 +	if (SCHEDULER_STOPPED())
 +		return;
  	if (lo->lo_flags & LO_NOPROFILE)
  		return;
  	spin = (LOCK_CLASS(lo)->lc_flags & LC_SPINLOCK) ? 1 : 0;
 
 Modified: head/sys/kern/subr_witness.c
 ==============================================================================
 --- head/sys/kern/subr_witness.c	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/kern/subr_witness.c	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -2162,6 +2162,13 @@ witness_save(struct lock_object *lock, c
  	struct lock_instance *instance;
  	struct lock_class *class;
  
 +	/*
 +	 * This function is used independently in locking code to deal with
 +	 * Giant, SCHEDULER_STOPPED() check can be removed here after Giant
 +	 * is gone.
 +	 */
 +	if (SCHEDULER_STOPPED())
 +		return;
  	KASSERT(witness_cold == 0, ("%s: witness_cold", __func__));
  	if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL)
  		return;
 @@ -2188,6 +2195,13 @@ witness_restore(struct lock_object *lock
  	struct lock_instance *instance;
  	struct lock_class *class;
  
 +	/*
 +	 * This function is used independently in locking code to deal with
 +	 * Giant, SCHEDULER_STOPPED() check can be removed here after Giant
 +	 * is gone.
 +	 */
 +	if (SCHEDULER_STOPPED())
 +		return;
  	KASSERT(witness_cold == 0, ("%s: witness_cold", __func__));
  	if (lock->lo_witness == NULL || witness_watch == -1 || panicstr != NULL)
  		return;
 
 Modified: head/sys/sys/lockstat.h
 ==============================================================================
 --- head/sys/sys/lockstat.h	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/sys/lockstat.h	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -185,17 +185,24 @@ extern uint64_t lockstat_nsecs(void);
  #define	LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(probe, lp, c, wt, f, l)  do {   \
  	uint32_t id;							     \
  									     \
 -    	lock_profile_obtain_lock_success(&(lp)->lock_object, c, wt, f, l);   \
 -	if ((id = lockstat_probemap[(probe)])) 			     	     \
 -		(*lockstat_probe_func)(id, (uintptr_t)(lp), 0, 0, 0, 0);     \
 +	if (!SCHEDULER_STOPPED()) {					     \
 +		lock_profile_obtain_lock_success(&(lp)->lock_object, c, wt,  \
 +		    f, l);						     \
 +		if ((id = lockstat_probemap[(probe)]))			     \
 +			(*lockstat_probe_func)(id, (uintptr_t)(lp), 0, 0,    \
 +			    0, 0);					     \
 +	}								     \
  } while (0)
  
  #define	LOCKSTAT_PROFILE_RELEASE_LOCK(probe, lp)  do {			     \
  	uint32_t id;							     \
  									     \
 -	lock_profile_release_lock(&(lp)->lock_object);			     \
 -	if ((id = lockstat_probemap[(probe)])) 			     	     \
 -		(*lockstat_probe_func)(id, (uintptr_t)(lp), 0, 0, 0, 0);     \
 +	if (!SCHEDULER_STOPPED()) {					     \
 +		lock_profile_release_lock(&(lp)->lock_object);		     \
 +		if ((id = lockstat_probemap[(probe)])) 		     	     \
 +			(*lockstat_probe_func)(id, (uintptr_t)(lp), 0, 0,    \
 +			    0, 0);					     \
 +	}								     \
  } while (0)
  
  #else	/* !KDTRACE_HOOKS */
 
 Modified: head/sys/sys/mutex.h
 ==============================================================================
 --- head/sys/sys/mutex.h	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/sys/mutex.h	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -370,7 +370,8 @@ do {									\
  									\
  	if (mtx_owned(&Giant)) {					\
  		WITNESS_SAVE(&Giant.lock_object, Giant);		\
 -		for (_giantcnt = 0; mtx_owned(&Giant); _giantcnt++)	\
 +		for (_giantcnt = 0; mtx_owned(&Giant) &&		\
 +		    !SCHEDULER_STOPPED(); _giantcnt++)			\
  			mtx_unlock(&Giant);				\
  	}
  
 
 Modified: head/sys/sys/systm.h
 ==============================================================================
 --- head/sys/sys/systm.h	Sun Dec 11 20:53:12 2011	(r228423)
 +++ head/sys/sys/systm.h	Sun Dec 11 21:02:01 2011	(r228424)
 @@ -47,6 +47,7 @@
  
  extern int cold;		/* nonzero if we are doing a cold boot */
  extern int rebooting;		/* kern_reboot() has been called. */
 +extern int stop_scheduler;	/* only one thread runs after panic */
  extern const char *panicstr;	/* panic message */
  extern char version[];		/* system version */
  extern char copyright[];	/* system copyright */
 @@ -109,6 +110,14 @@ enum VM_GUEST { VM_GUEST_NO = 0, VM_GUES
  	    ((uintptr_t)&(var) & (sizeof(void *) - 1)) == 0, msg)
  
  /*
 + * If we have already panic'd and this is the thread that called
 + * panic(), then don't block on any mutexes but silently succeed.
 + * Otherwise, the kernel will deadlock since the scheduler isn't
 + * going to run the thread that holds any lock we need.
 + */
 +#define	SCHEDULER_STOPPED() __predict_false(stop_scheduler)
 +
 +/*
   * XXX the hints declarations are even more misplaced than most declarations
   * in this file, since they are needed in one file (per arch) and only used
   * in two files.
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->analyzed 
State-Changed-By: avg 
State-Changed-When: Sun Jan 8 07:10:11 UTC 2012 
State-Changed-Why:  
The problem is understood and the fix is in head, 
though not enabled by default. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=139614 
State-Changed-From-To: analyzed->patched 
State-Changed-By: avg 
State-Changed-When: Mon Jan 16 11:25:16 UTC 2012 
State-Changed-Why:  
Should be fixed in head/CURRENT/10. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=139614 
State-Changed-From-To: patched->closed 
State-Changed-By: avg 
State-Changed-When: Mon Jul 16 11:12:35 UTC 2012 
State-Changed-Why:  
The problem is resolved in head and stable/9. 
A knob is provided for stable/8. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=139614 
>Unformatted:
