From nobody@FreeBSD.org  Wed Jan  7 11:26:52 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EAD9C1065672
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  7 Jan 2009 11:26:51 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id CE46D8FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  7 Jan 2009 11:26:51 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n07BQpnW089668
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 7 Jan 2009 11:26:51 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n07BQpwD089667;
	Wed, 7 Jan 2009 11:26:51 GMT
	(envelope-from nobody)
Message-Id: <200901071126.n07BQpwD089667@www.freebsd.org>
Date: Wed, 7 Jan 2009 11:26:51 GMT
From: Yvan Seth <Yvan.Seth@Zeus.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: kernel panic in/below sys_pipe.c:knlist_cleardel
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         130261
>Category:       kern
>Synopsis:       [kernel] [panic] kernel panic in/below sys_pipe.c:knlist_cleardel
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jan 07 11:30:01 UTC 2009
>Closed-Date:    
>Last-Modified:  Thu Jan 08 08:36:46 UTC 2009
>Originator:     Yvan Seth
>Release:        6.4-RELEASE-p1
>Organization:
Zeus Technologies
>Environment:
FreeBSD dante.cam.zeus.com 6.4-RELEASE-p1 FreeBSD 6.4-RELEASE-p1 #0: Sat Jan  3 10:02:24 GMT 2009     root@dante.cam.zeus.com:/usr/obj/usr/src/sys/DEBUG  i386


>Description:
FreeBSD 6.4-RELEASE-p1 panics regularly below sys_pipe.c:knlist_cleardel.

System details:
# uname -a
FreeBSD dante.cam.zeus.com 6.4-RELEASE-p1 FreeBSD 6.4-RELEASE-p1 #5: Mon Jan  5 23:37:03 GMT 2009     root@dante.cam.zeus.com:/usr/obj/usr/src/sys/DEBUG  i386
# dmesg | grep -i cpu
CPU: Intel Celeron (697.74-MHz 686-class CPU)
cpu0: <ACPI CPU> on acpi0
# dmesg | grep -i memory
real memory  = 267124736 (254 MB)
avail memory = 247779328 (236 MB)

This occurs with both downloaded binary kernels, as well as a self-compiled debug kernel.  The kernel source used is vanilla FreeBSD.  (We originally hit this on an old 6.1 machine, and have upgraded through 6.1-p24, 6.2-RELEASE, and finally 6.4-p1 - the issue is present in all versions in this list - I haven't got the time/resources to test with 7.1 at the moment.)  My compiled DEBUG kernel config is GENERIC with these added lines:
   options DDB
   options KDB
   options GDB
   options KDB_TRACE

The problem has occurred on two different systems, the (rather old) Celeron one detailed above and another AMD Opteron 250 (2.4GHz) with 2GB RAM.  Both have been through, and passed, a full memcheck.

The panic closely resembles an issue, "knlsit_cleardel() panic", discussed here:
 http://lists.freebsd.org/pipermail/freebsd-current/2008-March/thread.html#83991
This email thread seems to come to a sudden close after the OP suggests a fix.

The panic always happens under sys_pipe.c:knlist_cleardel and, so far (about 15 noted panics), has always occurred below a perl process.  However, there's a lot of perl code running on the system so this latter point is probably a red herring.

I have crash dumps and am willing to delve into them as much as requested.  I have wandered around one of the crash dumps in kgdb, and include below as much data as seems immediately relevant.  However I'm no kernel hacker, most certainly when it comes to FreeBSD - I can just hope some of this is useful.

For starters here is a typical crash dump, loaded into kgdb:

------------------------------
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x1c
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc069c946
stack pointer           = 0x28:0xd1898bc8
frame pointer           = 0x28:0xd1898bd8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2537 (perl)
panic: from debugger
KDB: stack backtrace:
Uptime: 20h44m58s
Dumping 254 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 254MB (64960 pages) 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

Reading symbols from /boot/kernel/acpi.ko...done.
Loaded symbols for /boot/kernel/acpi.ko
Reading symbols from /boot/kernel/linux.ko...done.
Loaded symbols for /boot/kernel/linux.ko
#0  doadump () at pcpu.h:165
165   pcpu.h: No such file or directory.
   in pcpu.h
(kgdb) bt
   <SNIP content=trap/doadump calls/>
#12 0xc069c946 in knlist_cleardel (knl=0xc2dcd208, td=0x0, islocked=1, killkn=0) at atomic.h:149
#13 0xc06e2597 in pipeclose (cpipe=0xc2dcd198) at /usr/src/sys/kern/sys_pipe.c:1526
#14 0xc06e2216 in pipe_close (fp=0x4, td=0xc343e180) at /usr/src/sys/kern/sys_pipe.c:1443
#15 0xc06980d8 in fdrop_locked (fp=0xc32e1ca8, td=0xc343e180) at file.h:296
#16 0xc0698001 in fdrop (fp=0xc32e1ca8, td=0xc343e180) at /usr/src/sys/kern/kern_descrip.c:2173
#17 0xc069662f in closef (fp=0xc32e1ca8, td=0xc343e180) at /usr/src/sys/kern/kern_descrip.c:1993
#18 0xc06939c3 in kern_close (td=0xc343e180, fd=5) at /usr/src/sys/kern/kern_descrip.c:1083
#19 0xc06937b4 in close (td=0xc343e180, uap=0x4) at /usr/src/sys/kern/kern_descrip.c:1035
#20 0xc0937903 in syscall (frame=
      {tf_fs = 59, tf_es = 138281019, tf_ds = -1078001605, tf_edi = 0, tf_esi = 673782480, tf_ebp = -1077943144, tf_isp = -779514524, tf_ebx = 673694016, tf_edx = 0, tf_ecx = 0, tf_eax = 6, tf_trapno = 0, tf_err = 2, tf_eip = 673632627, tf_cs = 51, tf_eflags = 530, tf_esp = -1077943172, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:984
#21 0xc09238df in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#22 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)
---------------------------------------------------

The PC is at:

---------------------------------------------------
0xc069c946 <knlist_cleardel+66>: cmpxchg %edx,0x1c(%ebx)
---------------------------------------------------

Which is beneath a macro in knlist_cleardel:

---------------------------------------------------
kern/kern_event.c:1721:      KQ_LOCK(kq);
---------------------------------------------------

The value in question is (*):

---------------------------------------------------
(kgdb) print kq
$16 = (struct kqueue *) 0x0
(kgdb) print &(kq)->kq_lock
$17 = (struct mtx *) 0x0
(kgdb) print &(&(kq)->kq_lock)->mtx_lock  <--;---<<<< *
$18 = (volatile uintptr_t *) 0x1c  <--------'
---------------------------------------------------

Which comes from drilling down to the relevant macro under KQ_LOCK of:

---------------------------------------------------
i386/include/atomic.h:atomic_cmpset_int(volatile u_int *dst, ...)
---------------------------------------------------

Where the dst value passed in, 0x1c, is accessed by code equivalent to the PC:

---------------------------------------------------
   __asm __volatile (
   "  " __XSTRING(MPLOCKED) " "
>How-To-Repeat:
When our software test system is running this occurs regularly, sometimes a couple of times per day.  At best the machines last 48 hours between panics.

When no tests are running the machines seem to be fine.

The software test system uses a lot of perl code.  A lot of perl executing external programs (open X, "|foo") - possibly a lot of pipe(close) usage then I presume.

Unfortunately I have not been able to isolate the issue to a simple test case.  I'm working on finding a simple way to replicate, but haven't made much progress on this.
>Fix:


>Release-Note:
>Audit-Trail:

From: Yvan Seth <Yvan.Seth@Zeus.com>
To: bug-followup@FreeBSD.org, Yvan.Seth@Zeus.com
Cc:  
Subject: Re: kern/130261: kernel panic in/below sys_pipe.c:knlist_cleardel
Date: Wed, 7 Jan 2009 12:37:51 +0000

 I should clarify: "We originally hit this on an old 6.1 machine, and
 have upgraded through 6.1-p24, 6.2-RELEASE, and finally 6.4-p1"
 
 We:
 * did a 6.1-RELEASE install and it panicked as described.
 * upgraded it to 6.1-RELEASE-p24 and the panic persisted.
 * did a fresh 6.2-RELEASE install, panic again.
 * upgraded this to 6.2-RELEASE-p9 (customer version), panic.
 * did a fresh 6.4-RELEASE install, panic.
 * upgraded this fresh install to 6.4-RELEASE-p1, panic.
 
 The important point being that we didn't upgrade between versions, 6.4
 was a fresh install.
 
 Thanks,
 Yvan

From: Yvan Seth <Yvan.Seth@Zeus.com>
To: bug-followup@FreeBSD.org, Yvan.Seth@Zeus.com
Cc:  
Subject: Re: kern/130261: kernel panic in/below sys_pipe.c:knlist_cleardel
Date: Wed, 7 Jan 2009 15:12:37 +0000

 In trying to replicate this more simply (still using our complex test
 scripts unfortunately) I'm seeing some slightly different panics.
 
 I've seen the following one just a couple of times before, but figure it
 must be related as it is also under knlist_cleardel.  To my untrained
 eye things look to be in an even worse state in this case, should
 knl->kl_list.slh_first->kn_kq.kq_lock.mtx_lock ever have a value of
 0x06?  On all occurrences of this form of the panic this has value 0x06,
 seemingly not random clobbering.
 
 The 'kq' is in state 0x10 - KQ_CLOSING
 The 'kn' has status 0x11 - KN_ACTIVE | KN_INFLUX
 
 Notably: 0x78 = 0x04+0x74 - i.e. "mov 0x74(%ecx),%eax"
 
 And: 0x04 = 0x06 & MTX_FLAGMASK (see #define mtx_owner)
 
 Perhaps: mtx_lock = MTX_UNOWNED | MTX_CONTESTED = MTX_DESTROYED
 
 
 More details:
 -----------------------------------------------------------------------
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x78
 fault code             = supervisor read, page not present
 instruction pointer    = 0x20:0xc06dc281
 stack pointer          = 0x28:0xd184cb64
 frame pointer          = 0x28:0xd184cb68
 code segment           = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
 processor eflags       = resume, IOPL = 0
 current process        = 24539 (perl)
 panic: from debugger
 KDB: stack backtrace:
 Uptime: 2h42m13s
 <SNIP/>
 #10 0xc092388a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
 #11 0xc06dc281 in turnstile_setowner (ts=0xc25124c0, owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:456
 #12 0xc06dc5de in turnstile_wait (lock=0xc2ec6600, owner=0x4, queue=0) at /usr/src/sys/kern/subr_turnstile.c:661
 #13 0xc06b1a5e in _mtx_lock_sleep (m=0xc2ec6600, tid=3272086656, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:579
 #14 0xc069c961 in knlist_cleardel (knl=0xc27bdb98, td=0x0, islocked=1, killkn=0) at /usr/src/sys/kern/kern_event.c:1730
 #15 0xc06e2597 in pipeclose (cpipe=0xc27bdb28) at /usr/src/sys/kern/sys_pipe.c:1526
 #16 0xc06e2216 in pipe_close (fp=0xc30814a0, td=0xc3081480) at /usr/src/sys/kern/sys_pipe.c:1443
 #17 0xc06980d8 in fdrop_locked (fp=0xc2fa83a8, td=0xc3081480) at file.h:296
 #18 0xc0698001 in fdrop (fp=0xc2fa83a8, td=0xc3081480) at /usr/src/sys/kern/kern_descrip.c:2173
 #19 0xc069662f in closef (fp=0xc2fa83a8, td=0xc3081480) at /usr/src/sys/kern/kern_descrip.c:1993
 #20 0xc06939c3 in kern_close (td=0xc3081480, fd=5) at /usr/src/sys/kern/kern_descrip.c:1083
 #21 0xc06937b4 in close (td=0xc3081480, uap=0xc30814a0) at /usr/src/sys/kern/kern_descrip.c:1035
 #22 0xc0937903 in syscall (frame=
       {tf_fs = 59, tf_es = 140116027, tf_ds = -1078001605, tf_edi = 0, tf_esi = 673782480, tf_ebp = -1077942568, tf_isp = -779825820, tf_ebx = 673694016, tf_edx = 0, tf_ecx = 0, tf_eax = 6, tf_trapno = 12, tf_err = 2, tf_eip = 673632627, tf_cs = 51, tf_eflags = 530, tf_esp = -1077942596, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:984
 #23 0xc09238df in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
 #24 0x00000033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 <SNIP/>
 (kgdb) p/x *knl
 st = {
     slh_first = 0xc2859770
   }, 
   kl_lock = 0xc069c7ec, 
   kl_unlock = 0xc069c820, 
   kl_locked = 0xc069c85c, 
   kl_lockarg = 0xc27bdc98
 }
 (kgdb) p/x *knl->kl_list.slh_first
 $3 = {
   kn_link = {
     sle_next = 0x0
   }, 
   kn_selnext = {
     sle_next = 0x0
   }, 
   kn_knlist = 0xc27bdb98, 
   kn_tqe = {
     tqe_next = 0x0, 
     tqe_prev = 0xc285a848
   }, 
   kn_kq = 0xc2ec6600, 
   kn_kevent = {
     ident = 0x1, 
     filter = 0xfffe, 
     flags = 0x0, 
     fflags = 0x0, 
     data = 0x4000, 
     udata = 0x0
   }, 
   kn_status = 0x11, 
   kn_sfflags = 0x0, 
   kn_sdata = 0x0, 
   kn_ptr = {
     p_fp = 0x0, 
     p_proc = 0x0
   }, 
   kn_fop = 0x0, 
   kn_hook = 0x0
 }
 (kgdb) p/x *knl->kl_list.slh_first->kn_kq
 $4 = {
   kq_lock = {
     mtx_object = {
       lo_class = 0xc0a32c84, 
       lo_name = 0xc09b8585, 
       lo_type = 0xc09b8585, 
       lo_flags = 0x420000, 
       lo_list = {
         tqe_next = 0x0, 
         tqe_prev = 0x0
       }, 
       lo_witness = 0x0
     }, 
     mtx_lock = 0x6,   <<<<======================= ???????
     mtx_recurse = 0x0
   }, 
   kq_refcnt = 0x1, 
   kq_list = {
     sle_next = 0x0
   }, 
   kq_head = {
     tqh_first = 0x0, 
     tqh_last = 0xc2ec662c
   }, 
   kq_count = 0x0, 
   kq_sel = {
     si_thrlist = {
       tqe_next = 0x0, 
       tqe_prev = 0x0
     }, 
     si_thread = 0x0, 
     si_note = {
       kl_list = {
         slh_first = 0x0
       }, 
       kl_lock = 0x0, 
       kl_unlock = 0x0, 
       kl_locked = 0xc069c85c, 
       kl_lockarg = 0x0
     }, 
     si_flags = 0x0
   }, 
   kq_sigio = 0x0, 
   kq_fdp = 0x0, 
   kq_state = 0x10, 
   kq_knlistsize = 0x100, 
   kq_knlist = 0xc268e800, 
   kq_knhashmask = 0x0, 
   kq_knhash = 0x0, 
   kq_task = {
     ta_link = {
       stqe_next = 0x0
     }, 
     ta_pending = 0x0, 
     ta_priority = 0x0, 
     ta_func = 0xc069b788, 
     ta_context = 0xc2ec6600
   }
 }
 -----------------------------------------------------------------------
 
 Regards,
 -Yvan
 
>Unformatted:
 >>>"  cmpxchgl %2,%1 ;  " <<<
    "       setz   %%al ;      "
    "  movzbl   %%al,%0 ;   "
 ---------------------------------------------------
 
 Anyway, presumably 0x1c is the offset from the start of the relevant struct, 'kq', so this struct being null looks like the problem.  Why and how has the 'kq' value in 'knlist_cleardel' been permitted to be 0x0?  The 'kq' value is 'kn->kn_kq',  'kn' whic h looks somewhat dead or useless:
 
 ----------------------------------------------
 (kgdb) p *kn
 $57 = {
   kn_link = {
     sle_next = 0x0
   },
   kn_selnext = {
     sle_next = 0x0
   },
   kn_knlist = 0x0,
   kn_tqe = {
     tqe_next = 0x0,
     tqe_prev = 0x0
   },
   kn_kq = 0x0,
   kn_kevent = {
     ident = 0,
     filter = 0,
     flags = 0,
     fflags = 0,
     data = 0,
     udata = 0x0
   },
   kn_status = 32,  <--- KN_MARKER (ignore this knote)
   kn_sfflags = 0,
   kn_sdata = 0,
   kn_ptr = {
     p_fp = 0x0,
     p_proc = 0x0
   },
   kn_fop = 0x0,
   kn_hook = 0x0
 }
 ----------------------------------------------
 
 KN_MARKER looks suspicious, and matches up with content in the mailing list discussion referenced earlier.  The 'kn' value, in turn, comes from knl->kl_list, and knl is a function argument that looks like:
 
 ----------------------------------------------
 (kgdb) p *knl
 $58 = {
   kl_list = {
     slh_first = 0xc39cd908
   },
   kl_lock = 0xc069c7ec <knlist_mtx_lock>,
   kl_unlock = 0xc069c820 <knlist_mtx_unlock>,
   kl_locked = 0xc069c85c <knlist_mtx_locked>,
   kl_lockarg = 0xc2dcd308
 }
 ----------------------------------------------
 
 Up another level in 'pipeclose' our knlist_cleardel call comes from:
 
 ----------------------------------------------
    knlist_clear(&cpipe->pipe_sel.si_note, 1);
 ----------------------------------------------
 
 Inspecting 'cpipe' I have:
 
 -----------------------------------------------
 (kgdb) p *cpipe
 $60 = {
   pipe_buffer = {
     cnt = 0,
     in = 0,
     out = 0,
     size = 16384,
     buffer = 0x0
   },
   pipe_map = {
     cnt = 0,
     pos = 0,
     npages = 0,
     ms =       {0x0 <repeats 17 times>}
   },
   pipe_sel = {
     si_thrlist = {
       tqe_next = 0x0,
       tqe_prev = 0xc343e1b0
     },
     si_thread = 0x0,
     si_note = {
       kl_list = {
         slh_first = 0xc39cd908
       },
       kl_lock = 0xc069c7ec <knlist_mtx_lock>,
       kl_unlock = 0xc069c820 <knlist_mtx_unlock>,
       kl_locked = 0xc069c85c <knlist_mtx_locked>,
       kl_lockarg = 0xc2dcd308
     },
     si_flags = 0
   },
   pipe_atime = {
     tv_sec = 1231065243,
     tv_nsec = 0
   },
   pipe_mtime = {
     tv_sec = 1231065243,
     tv_nsec = 0
   },
   pipe_ctime = {
     tv_sec = 1231065243,
     tv_nsec = 0
   },
   pipe_sigio = 0x0,
   pipe_peer = 0xc2dcd250,
   pipe_pair = 0xc2dcd198,
   pipe_state = 2176,
   pipe_busy = 0,
   pipe_present = 0
 }
 -----------------------------------------------
 
 
 I tried adding the KASSERT suggested in this email:
    http://lists.freebsd.org/pipermail/freebsd-current/2008-March/084001.html
 Unfortunately if I compile the kernel with INVARIANTS and INVARIANT_SUPPORT
 it boots into a panic (in single-user it boots to a shell prompt but panics
 fairly soon during commands such as 'ls'.)
 
 That's where I'm up to - it is probably time to get some input from experts who
 actually know something about the FreeBSD kernel.
 
 Kind regards,
 Yvan
