From jilles@stack.nl  Thu May  8 03:54:36 2003
Return-Path: <jilles@stack.nl>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6188937B401
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  8 May 2003 03:54:36 -0700 (PDT)
Received: from hexagon.stack.nl (hexagon.stack.nl [131.155.140.144])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EA75643F85
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  8 May 2003 03:54:34 -0700 (PDT)
	(envelope-from jilles@stack.nl)
Received: by hexagon.stack.nl (Postfix, from userid 65534)
	id A36D71C4C; Thu,  8 May 2003 12:54:33 +0200 (CEST)
Received: from turtle.stack.nl (turtle.stack.nl [2001:610:1108:5010::132])
	by hexagon.stack.nl (Postfix) with ESMTP
	id B1CA81C5D; Thu,  8 May 2003 12:54:25 +0200 (CEST)
Received: by turtle.stack.nl (Postfix, from userid 1677)
	id A32321CC2D; Thu,  8 May 2003 12:54:25 +0200 (CEST)
Message-Id: <20030508105425.A32321CC2D@turtle.stack.nl>
Date: Thu,  8 May 2003 12:54:25 +0200 (CEST)
From: Jilles Tjoelker <jilles@stack.nl>
Reply-To: Jilles Tjoelker <jilles@stack.nl>
To: FreeBSD-gnats-submit@freebsd.org
Cc: svdb@stack.nl
Subject: panic: race condition with realitexpire() called for zombie
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         51964
>Category:       kern
>Synopsis:       panic: race condition with realitexpire() called for zombie
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu May 08 04:00:29 PDT 2003
>Closed-Date:    Mon Jul 28 12:05:46 PDT 2003
>Last-Modified:  Mon Jul 28 12:05:46 PDT 2003
>Originator:     Jilles Tjoelker
>Release:        FreeBSD 5.0-CURRENT i386
>Organization:
MCGV Stack
>Environment:
System: FreeBSD turtle.stack.nl 5.0-CURRENT FreeBSD 5.0-CURRENT #9: Wed Mar 26 16:42:07 CET 2003 jilles@turtle.stack.nl:/usr/obj/usr/src/sys/TURTLE i386

Dual Athlon XP 2000+
1 GB RAM

XFree86-Server-4.2.1_7

The test program was also tried on a uniprocessor machine running 4.6-RELEASE,
which did not crash. I have not been able to test it on a UP 5-CURRENT box.

Using SCHED_4BSD.

>Description:
When shutting down the X server, the machine sometimes panics. This has been
happening for about 2.5 months.

I have two traces: one with the X server (started via Xwrapper-4), and one with
the test program (called itimer).

There were similar panics earlier, but as there was no kernel.debug available,
little analysis was possible. A trap occured in realitexpire().

The function realitexpire() is called for a zombie process. Also, the fact that
p==NULL seems strange.

Script started on Thu Apr  3 12:54:18 2003
# gdb -k kernel.debug.3 vmcore.3
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 01000000
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01cfdb8
stack pointer	        = 0x10:0xdf0e5c24
frame pointer	        = 0x10:0xdf0e5c58
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 14 (swi7: tty:sio clock)
trap number		= 12
panic: page fault
cpuid = 0; lapic.id = 01000000
boot() called on cpu#0

syncing disks, buffers remaining... 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 
giving up on 223 buffers
Uptime: 5d21h3m49s
Dumping 1023 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]  672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008
---
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
239		dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
#1  0xc01ccdb1 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xc01cd137 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
#3  0xc0334ae2 in trap_fatal (frame=0xdf0e5be4, eva=0)
    at /usr/src/sys/i386/i386/trap.c:843
#4  0xc0334792 in trap_pfault (frame=0xdf0e5be4, usermode=0, eva=0)
    at /usr/src/sys/i386/i386/trap.c:757
#5  0xc0334282 in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -899772928, tf_esi = 1, tf_ebp = -552706984, tf_isp = -552707056, tf_ebx = 0, tf_edx = 0, tf_ecx = 8192, tf_eax = 8192, tf_trapno = 12, tf_err = 0, tf_eip = -1071841864, tf_cs = 8, tf_eflags = 66070, tf_esp = -899772640, tf_ss = 134217742})
    at /usr/src/sys/i386/i386/trap.c:444
#6  0xc031c808 in calltrap () at {standard input}:97
#7  0xc01dc0ec in realitexpire (arg=0xca5e8e00)
    at /usr/src/sys/kern/kern_time.c:530
#8  0xc01dc845 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:195
#9  0xc01b88b4 in ithread_loop (arg=0xc21d2180)
    at /usr/src/sys/kern/kern_intr.c:536
#10 0xc01b7743 in fork_exit (callout=0xc01b86e0 <ithread_loop>, arg=0x0, 
    frame=0x0) at /usr/src/sys/kern/kern_fork.c:791
(kgdb) frame 7
#7  0xc01dc0ec in realitexpire (arg=0xca5e8e00)
    at /usr/src/sys/kern/kern_time.c:530
530		psignal(p, SIGALRM);
(kgdb) print p
$1 = (struct proc *) 0x0
(kgdb) print *(struct proc *)arg
$2 = {p_list = {le_next = 0xc6d73800, le_prev = 0xc0392f58}, p_ksegrps = {
    tqh_first = 0xc7ed9180, tqh_last = 0xc7ed9184}, p_threads = {
    tqh_first = 0xc7e25780, tqh_last = 0xc7e25788}, p_suspended = {
    tqh_first = 0x0, tqh_last = 0xca5e8e18}, p_ucred = 0x0, p_fd = 0x0, 
  p_stats = 0xe5330a80, p_limit = 0x0, p_upages_obj = 0xca5e5390, 
  p_procsig = 0x0, p_flag = 24834, p_sflag = 1, p_state = PRS_ZOMBIE, 
  p_pid = 89957, p_hash = {le_next = 0x0, le_prev = 0xc21c8d94}, p_pglist = {
    le_next = 0x0, le_prev = 0xcaade008}, p_pptr = 0xca5e9c00, p_sibling = {
    le_next = 0x0, le_prev = 0xca5e9c64}, p_children = {lh_first = 0x0}, 
  p_mtx = {mtx_object = {lo_class = 0xc0392840, 
      lo_name = 0xc0367102 "process lock", 
      lo_type = 0xc0367102 "process lock", lo_flags = 4325376, lo_list = {
        tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, 
    mtx_lock = 3256699952, mtx_recurse = 0, mtx_blocked = {tqh_first = 0x0, 
      tqh_last = 0xca5e8e8c}, mtx_contested = {le_next = 0x0, 
      le_prev = 0xc7e257e8}}, p_oppid = 0, p_vmspace = 0x0, p_swtime = 117, 
  p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 20000}, it_value = {
      tv_sec = 507813, tv_usec = 320975}}, p_runtime = {sec = 3, 
    frac = 14595943908044818724}, p_uu = 3255398, p_su = 539957, p_iu = 1, 
  p_uticks = 416, p_sticks = 69, p_iticks = 0, p_profthreads = 0, 
  p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, 
  p_siglist = {__bits = {0, 0, 0, 0}}, p_textvp = 0x0, p_lock = 0 '\0', 
  p_klist = {slh_first = 0x0}, p_sigiolst = {slh_first = 0x0}, 
---Type <return> to continue, or q <return> to quit---
  p_sigparent = 20, p_oldsigmask = {__bits = {0, 0, 0, 0}}, p_sig = 0, 
  p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', 
  p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, 
  p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {ss_sp = 0x0, ss_size = 0, 
    ss_flags = 4}, p_magic = 3203398350, 
  p_comm = "XFree86\0-4\0\0\0\0\0\0\0\0\0", p_pgrp = 0x0, 
  p_sysent = 0xc03ad5e0, p_args = 0x0, p_cpulimit = 900, p_xstat = 0, 
  p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0x0}, p_itcallout = {
    c_links = {sle = {sle_next = 0xca5ea834}, tqe = {tqe_next = 0xca5ea834, 
        tqe_prev = 0xd288e680}}, c_time = 50769900, c_arg = 0xca5e8e00, 
    c_func = 0, c_flags = 2}, p_uarea = 0xe5330000, p_acflag = 0, p_ru = 0x0, 
  p_peers = 0x0, p_leader = 0xca5e8e00, p_emuldata = 0x0, p_label = {
    l_flags = 0, l_perpolicy = {{l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, 
        l_long = 0}, {l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, l_long = 0}}}, 
  p_sched = 0xca5e9000}
(kgdb) # 
# gdb -k kernel.debug.4 vmcore.4
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 00000000
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01cfdb8
stack pointer	        = 0x10:0xdf0e5c24
frame pointer	        = 0x10:0xdf0e5c58
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 14 (swi7: tty:sio clock)
trap number		= 12
panic: page fault
cpuid = 1; lapic.id = 00000000
boot() called on cpu#1

syncing disks, buffers remaining... 6255 6255 6253 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 
giving up on 1878 buffers
Uptime: 1h25m40s
Dumping 1023 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008
---
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
239		dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
#1  0xc01ccdb1 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xc01cd137 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
#3  0xc0334ae2 in trap_fatal (frame=0xdf0e5be4, eva=0)
    at /usr/src/sys/i386/i386/trap.c:843
#4  0xc0334792 in trap_pfault (frame=0xdf0e5be4, usermode=0, eva=0)
    at /usr/src/sys/i386/i386/trap.c:757
#5  0xc0334282 in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -955814912, tf_esi = 1, tf_ebp = -552706984, tf_isp = -552707056, tf_ebx = 0, tf_edx = 0, tf_ecx = 8192, tf_eax = 8192, tf_trapno = 12, tf_err = 0, tf_eip = -1071841864, tf_cs = 8, tf_eflags = 66070, tf_esp = -955814624, tf_ss = 134217742})
    at /usr/src/sys/i386/i386/trap.c:444
#6  0xc031c808 in calltrap () at {standard input}:97
#7  0xc01dc0ec in realitexpire (arg=0xc7076c00)
    at /usr/src/sys/kern/kern_time.c:530
#8  0xc01dc845 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:195
#9  0xc01b88b4 in ithread_loop (arg=0xc21d2180)
    at /usr/src/sys/kern/kern_intr.c:536
#10 0xc01b7743 in fork_exit (callout=0xc01b86e0 <ithread_loop>, arg=0x0, 
    frame=0x0) at /usr/src/sys/kern/kern_fork.c:791
(kgdb) frame 7
#7  0xc01dc0ec in realitexpire (arg=0xc7076c00)
    at /usr/src/sys/kern/kern_time.c:530
530		psignal(p, SIGALRM);
(kgdb) print p
$1 = (struct proc *) 0x0
(kgdb) print *(struct proc *)arg
$2 = {p_list = {le_next = 0xc713ac00, le_prev = 0xc7101400}, p_ksegrps = {
    tqh_first = 0xc7074d80, tqh_last = 0xc7074d84}, p_threads = {
    tqh_first = 0xc70713c0, tqh_last = 0xc70713c8}, p_suspended = {
    tqh_first = 0x0, tqh_last = 0xc7076c18}, p_ucred = 0x0, p_fd = 0x0, 
  p_stats = 0xe516fa80, p_limit = 0x0, p_upages_obj = 0xc6dd3ab0, 
  p_procsig = 0x0, p_flag = 8194, p_sflag = 1, p_state = PRS_ZOMBIE, 
  p_pid = 8772, p_hash = {le_next = 0x0, le_prev = 0xc21c8910}, p_pglist = {
    le_next = 0xc708f400, le_prev = 0xc712c050}, p_pptr = 0xc712c000, 
  p_sibling = {le_next = 0xc708f400, le_prev = 0xc712c064}, p_children = {
    lh_first = 0x0}, p_mtx = {mtx_object = {lo_class = 0xc0392840, 
      lo_name = 0xc0367102 "process lock", 
      lo_type = 0xc0367102 "process lock", lo_flags = 4325376, lo_list = {
        tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, 
    mtx_lock = 3256699952, mtx_recurse = 0, mtx_blocked = {tqh_first = 0x0, 
      tqh_last = 0xc7076c8c}, mtx_contested = {le_next = 0x0, le_prev = 0x0}}, 
  p_oppid = 0, p_vmspace = 0x0, p_swtime = 0, p_realtimer = {it_interval = {
      tv_sec = 0, tv_usec = 10000}, it_value = {tv_sec = 5125, 
      tv_usec = 942046}}, p_runtime = {sec = 0, frac = 10478957466101902}, 
  p_uu = 0, p_su = 753, p_iu = 0, p_uticks = 0, p_sticks = 1, p_iticks = 0, 
  p_profthreads = 0, p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, 
  p_tracecred = 0x0, p_siglist = {__bits = {0, 0, 0, 0}}, p_textvp = 0x0, 
  p_lock = 0 '\0', p_klist = {slh_first = 0x0}, p_sigiolst = {
    slh_first = 0x0}, p_sigparent = 20, p_oldsigmask = {__bits = {0, 0, 0, 
---Type <return> to continue, or q <return> to quit---
      0}}, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', 
  p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, 
  p_suspcount = 0, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {
    ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_magic = 3203398350, 
  p_comm = "itimer", '\0' <repeats 13 times>, p_pgrp = 0x0, 
  p_sysent = 0xc03ad5e0, p_args = 0x0, p_cpulimit = 900, p_xstat = 0, 
  p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0x0}, p_itcallout = {
    c_links = {sle = {sle_next = 0xd28760d8}, tqe = {tqe_next = 0xd28760d8, 
        tqe_prev = 0xd289f328}}, c_time = 512385, c_arg = 0xc7076c00, 
    c_func = 0, c_flags = 2}, p_uarea = 0xe516f000, p_acflag = 1, p_ru = 0x0, 
  p_peers = 0x0, p_leader = 0xc7076c00, p_emuldata = 0x0, p_label = {
    l_flags = 0, l_perpolicy = {{l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, 
        l_long = 0}, {l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, l_long = 0}}}, 
  p_sched = 0xc7076e00}
(kgdb) # 
Script done on Thu Apr  3 13:24:03 2003

>How-To-Repeat:
The problem originally occured when the X server was shut down (not always, and
it seemed to depend on the way it was started). The program "itimer.c" (by
Serge van den Boom <svdb@stack.nl>) reproduces the panic on our system. It may
be necessary to change the delays somewhat. It will start 500 child processes.
Also, it may take 20 seconds or more before the panic occurs.

>Fix:
Probably revision 1.97 of src/sys/kern/kern_time.c introduced the problem as it
changed the locking for setitimer(), realitexpire(), etc.  kern_exit.c may be
related as well (exit1() stops the callout, perhaps it was already called in
another thread).

The bug may have been fixed already, but looking briefly in CVSweb, it does not
seem to be. The sources were cvsup'ed on March 26, just before compilation of
world and kernel (nightly cvsup was right in a commit of some ATM stuff and
didn't compile).

--- itimer.c begins here ---
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <signal.h>

int
main(int argc, char *argv[]) {
	struct itimerval tv =
	{
		{0, 1},
		{0, 1}
	};
	pid_t pid;
	int i;
	int waitval;
	
	while(1) {
	 	i = 500;
 		while (i--) {
 			pid = fork();
 			switch (pid) {
 				case -1:
 					perror("fork failed");
 					exit(1);
 				case 0: // child
 					signal(SIGALRM, SIG_IGN);
 					setitimer(ITIMER_REAL, &tv, NULL);
 					usleep(100);
 					return 0;
 				default: // parent
 					break;
 			}
 		}
 		
 		usleep(20);
 		i = 500;
 		while(i--)
 			wait(&waitval);
 	}
	return 0;
}
--- itimer.c ends here ---
>Release-Note:
>Audit-Trail:

From: Tor.Egge@cvsup.no.freebsd.org
To: jilles@stack.nl
Cc: FreeBSD-gnats-submit@freebsd.org, svdb@stack.nl
Subject: Re: kern/51964: panic: race condition with realitexpire() called
 for	zombie
Date: Sat, 10 May 2003 17:07:56 GMT

 The enclosed patch should fix the problem if your source is newer than
 "2003/05/01 14:16:38 PDT" (the struct proc mutex must be persistent).
 
 If callout_pending() is true then callout_reset() has been called.
 If callout_active() is false then callout_stop() has been called.
 
 ---------------------------
 Index: sys/kern/kern_time.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_time.c,v
 retrieving revision 1.101
 diff -u -r1.101 kern_time.c
 --- sys/kern/kern_time.c	26 Feb 2003 17:16:38 -0000	1.101
 +++ sys/kern/kern_time.c	10 May 2003 16:44:41 -0000
 @@ -527,6 +527,12 @@
  
  	p = (struct proc *)arg;
  	PROC_LOCK(p);
 +	/* Check for callout being active */
 +	if (callout_pending(&p->p_itcallout) ||
 +	    !callout_active(&p->p_itcallout)) {
 +		PROC_UNLOCK(p);
 +		return;
 +	}
  	psignal(p, SIGALRM);
  	if (!timevalisset(&p->p_realtimer.it_interval)) {
  		timevalclear(&p->p_realtimer.it_value);
 ---------------------------
 
 - Tor Egge

From: Jilles Tjoelker <jilles@stack.nl>
To: Tor.Egge@cvsup.no.freebsd.org
Cc: FreeBSD-gnats-submit@freebsd.org, svdb@stack.nl, unix@stack.nl
Subject: Re: kern/51964: panic: race condition with realitexpire() called for	zombie
Date: Mon, 12 May 2003 16:43:25 +0200

 On Sat, May 10, 2003 at 05:07:56PM +0000, Tor.Egge@cvsup.no.freebsd.org wrote:
 > The enclosed patch should fix the problem if your source is newer than
 > "2003/05/01 14:16:38 PDT" (the struct proc mutex must be persistent).
 
 > If callout_pending() is true then callout_reset() has been called.
 > If callout_active() is false then callout_stop() has been called.
 
 I have upgraded the system and applied the patch. Brief testing
 indicates I can run the itimer program for more than 15 minutes without
 a crash.
 
 > ---------------------------
 > Index: sys/kern/kern_time.c
 > ===================================================================
 > RCS file: /home/ncvs/src/sys/kern/kern_time.c,v
 > retrieving revision 1.101
 > diff -u -r1.101 kern_time.c
 > --- sys/kern/kern_time.c	26 Feb 2003 17:16:38 -0000	1.101
 > +++ sys/kern/kern_time.c	10 May 2003 16:44:41 -0000
 > @@ -527,6 +527,12 @@
 >  
 >  	p = (struct proc *)arg;
 >  	PROC_LOCK(p);
 > +	/* Check for callout being active */
 > +	if (callout_pending(&p->p_itcallout) ||
 > +	    !callout_active(&p->p_itcallout)) {
 > +		PROC_UNLOCK(p);
 > +		return;
 > +	}
 >  	psignal(p, SIGALRM);
 >  	if (!timevalisset(&p->p_realtimer.it_interval)) {
 >  		timevalclear(&p->p_realtimer.it_value);
 > ---------------------------

From: Jilles Tjoelker <jilles@stack.nl>
To: Tor.Egge@cvsup.no.freebsd.org
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/51964: panic: race condition with realitexpire() called for	zombie
Date: Wed, 21 May 2003 12:26:00 +0200

 The system has not panicked since I upgraded and applied the patch
 (about 8 days ago), so it seems to be fixed.  The X server has been
 started and shut down many times (we use X -once -query ...).  The patch
 should perhaps be committed.
 
 The race condition (when callout_stop() is called, another thread may
 still be executing in the callout function) may be present in other
 parts of the system as well.
 
 Jilles

From: Tor.Egge@cvsup.no.freebsd.org
To: jilles@stack.nl
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/51964: panic: race condition with realitexpire() called
 for	zombie
Date: Wed, 21 May 2003 15:06:18 GMT

 > The system has not panicked since I upgraded and applied the patch
 > (about 8 days ago), so it seems to be fixed.  The X server has been
 > started and shut down many times (we use X -once -query ...).  The patch
 > should perhaps be committed.
 
 I've been told that a hopefully better fix (without overhead in the
 common case) is being worked on.
 
 > The race condition (when callout_stop() is called, another thread may
 > still be executing in the callout function) may be present in other
 > parts of the system as well.
 
 Correct.
 
 - Tor Egge

From: Jilles Tjoelker <jilles@stack.nl>
To: freebsd-gnats-submit@freebsd.org
Cc: unix@stack.nl
Subject: kern/51964
Date: Mon, 28 Jul 2003 17:26:09 +0200

 Some weeks ago, a different solution was committed (kern_exit.c rev
 1.215 and others).
 
 This fixes the problem, please close this PR.
 
 Jilles
State-Changed-From-To: open->closed 
State-Changed-By: ceri 
State-Changed-When: Mon Jul 28 12:05:20 PDT 2003 
State-Changed-Why:  
The submitter reports that this problem has now been corrected. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=51964 
>Unformatted:
