From scrappy@hub.org  Fri Jun  6 20:06:59 2003
Return-Path: <scrappy@hub.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CB42137B401
	for <FreeBSD-gnats-submit@freebsd.org>; Fri,  6 Jun 2003 20:06:59 -0700 (PDT)
Received: from hub.org (hub.org [64.117.225.220])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0F3BA43FB1
	for <FreeBSD-gnats-submit@freebsd.org>; Fri,  6 Jun 2003 20:06:59 -0700 (PDT)
	(envelope-from scrappy@hub.org)
Received: by hub.org (Postfix, from userid 1002)
	id 096D16BA011; Sat,  7 Jun 2003 00:06:57 -0300 (ADT)
Message-Id: <20030607030657.096D16BA011@hub.org>
Date: Sat,  7 Jun 2003 00:06:57 -0300 (ADT)
From: Marc G.Fournier <scrappy@hub.org>
Reply-To: Marc G.Fournier <scrappy@hub.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         53004
>Category:       kern
>Synopsis:       union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    das
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 06 20:10:07 PDT 2003
>Closed-Date:    Mon Jan 12 22:52:24 PST 2004
>Last-Modified:  Mon Jan 12 22:52:24 PST 2004
>Originator:     Marc G. Fournier
>Release:        FreeBSD 4.8-STABLE i386
>Organization:
Hub.Org Networking Services (http://www.hub.org)
>Environment:
System: FreeBSD hub.org 4.8-STABLE FreeBSD 4.8-STABLE #1: Sat May 31 22:57:04 ADT 2003 root@pluto.hub.org:/usr/obj/usr/src/sys/kernel i386


	
>Description:

Script started on Sat Jun  7 00:03:16 2003
jupiter# gdb -k kernel.debug vmcore.1
<copyright text deleted>
SMP 2 cpus
IdlePTD at phsyical address 0x0033c000
initial pcb at physical address 0x002adfa0
panicstr: union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
panic messages:
---
panic: union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1

syncing disks... 2 
done
Uptime: 13h15m28s

<dumping to ... deleted>
---
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
487		if (dumping++) {
(kgdb) where
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
#1  0x80150d9b in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:316
#2  0x8015120d in panic (fmt=0x8025c2a0 "union_lookup returning . (%p) not same as startdir (%p)") at /usr/src/sys/kern/kern_shutdown.c:595
#3  0x8018eee4 in union_lookup (ap=0xbc332e00) at /usr/src/sys/miscfs/union/union_vnops.c:615
#4  0x8017dac5 in lookup (ndp=0xbc332e7c) at vnode_if.h:52
#5  0x8017d5c0 in namei (ndp=0xbc332e7c) at /usr/src/sys/kern/vfs_lookup.c:153
#6  0x80183921 in lstat (p=0xbc28ed40, uap=0xbc332f80) at /usr/src/sys/kern/vfs_syscalls.c:1824
#7  0x8023da35 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134554112, tf_esi = 2143288252, tf_ebp = 2143279584, tf_isp = -1137496108, 
      tf_ebx = 2143281760, tf_edx = 134548962, tf_ecx = 990, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 673051168, tf_cs = 31, tf_eflags = 518, tf_esp = 2143279444, 
      tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1175
#8  0x8022af5b in Xint0x80_syscall ()
cannot read proc at 0
(kgdb) up 3
#3  0x8018eee4 in union_lookup (ap=0xbc332e00) at /usr/src/sys/miscfs/union/union_vnops.c:615
615			panic("union_lookup returning . (%p) not same as startdir (%p)", ap->a_vpp, dvp);
(kgdb) list
610	
611	#ifdef DIAGNOSTIC
612		if (cnp->cn_namelen == 1 &&
613		    cnp->cn_nameptr[0] == '.' &&
614		    *ap->a_vpp != dvp) {
615			panic("union_lookup returning . (%p) not same as startdir (%p)", ap->a_vpp, dvp);
616		}
617	#endif
618	
619		return (error);
(kgdb) print @ *ap
$1 = {a_desc = 0x80279880, a_dvp = 0xc1fa8a40, a_vpp = 0xbc332e90, a_cnp = 0xbc332ea4}
(kgdb) print ap->a_vpp
$2 = (struct vnode **) 0xbc332e90
(kgdb) print dvp
$3 = (struct vnode *) 0xc1fa8a40
(kgdb) quit
jupiter# exit
exit

Script done on Sat Jun  7 00:04:05 2003

If there is more out of hte vmcore that I can provide, please ask ...

>How-To-Repeat:

  there is alot running on the machine, so it could be pretty much anything,
but I had just typed in pkg_delete -f <pkg> when it crashed ... may or may
not be related ...

>Fix:

	


>Release-Note:
>Audit-Trail:

From: Tom Alsberg <alsbergt@cs.huji.ac.il>
To: freebsd-gnats-submit@FreeBSD.org, scrappy@hub.org
Cc:  
Subject: Re: kern/53004: union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
Date: Mon, 16 Jun 2003 12:01:47 +0300

 I noticed this a few days ago too, and sent a message to the
 FreeBSD-hackers list.  David Schultz <das@FreeBSD.ORG> asked me to
 repost this to gnats as a followup to this PR.  Following it is,
 including a simple (and yet "foolproof" as I noticed) way to reproduce
 it:
 
 <snip>
 From: Tom Alsberg <alsbergt@cs.huji.ac.il>
 To: FreeBSD Hackers List <freebsd-hackers@freebsd.org>
 Subject: (bug?) panic in union filesystem - file/.
 
 Hi there.
 
 I recently stumbled upon a crash in the union filesystem.  It seems
 that when trying to stat "<file>/." where file is a regular
 (non-directory) file in a union mounted filesystem, the system will
 panic.
 
 I first noticed this as an effect of zsh (Z shell)'s tab completion,
 which after I checked, tries to lstat "<file>/." if there are no other
 completions and the file exists, to see if it is a directory with
 other files in it which it should try to complete (I do not know why
 they chose to do it this way).
 
 It seems like a bug in the union filesystem to me.  I can reproduce it
 on both 4.8-STABLE and 5.1-CURRENT.
 
 Simplest way I reproduce it:
 
 # Create two directories somewhere:
 	cd /var/tmp
 	mkdir foo
 	mkdir bar
 # union-mount one on top of the other:
 	mount -t union bar foo
 # enter the mounted directory, create a regular file there, and read
 # <file>/.:
 	cd foo
 	touch meow
 	cat meow/.
 
 Everywhere I checked, there is a panic at that point:
 
 panic: union_lookup returning . (0xc8d83edc) not same as startdir (0xc8cb2e00)
 
 Relevant part of a backtrace (with gdb -k on saved core files of a
 4.8-CURRENT kernel compiled with debugging):
 
 <snip>
 #0  dumpsys () at /r+d/4.8/src/sys/kern/kern_shutdown.c:487
 #1  0xc022b067 in boot (howto=256) at /r+d/4.8/src/sys/kern/kern_shutdown.c:316
 #2  0xc022b4a5 in panic (
     fmt=0xc0420e80 "union_lookup returning . (%p) not same as startdir (%p)")
     at /r+d/4.8/src/sys/kern/kern_shutdown.c:595
 #3  0xc02674b8 in union_lookup (ap=0xc8d83d70)
     at /r+d/4.8/src/sys/miscfs/union/union_vnops.c:615
 #4  0xc02577fd in lookup (ndp=0xc8d83ec8) at vnode_if.h:52
 #5  0xc02572f8 in namei (ndp=0xc8d83ec8)
     at /r+d/4.8/src/sys/kern/vfs_lookup.c:153
 #6  0xc025fd43 in vn_open (ndp=0xc8d83ec8, fmode=1, cmode=0)
     at /r+d/4.8/src/sys/kern/vfs_vnops.c:138
 #7  0xc025be78 in open (p=0xc8d74ac0, uap=0xc8d83f80)
     at /r+d/4.8/src/sys/kern/vfs_syscalls.c:1029
 #8  0xc03c5a45 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
       tf_edi = 134564005, tf_esi = -1077939303, tf_ebp = -1077939744, 
       tf_isp = -925351980, tf_ebx = -1077939304, tf_edx = 134578912, 
       tf_ecx = 1, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 134531788, 
       tf_cs = 31, tf_eflags = 663, tf_esp = -1077939788, tf_ss = 47})
     at /r+d/4.8/src/sys/i386/i386/trap.c:1175
 #9  0xc03b5995 in Xint0x80_syscall ()
 </snip>
 
 I looked a bit at the code of the union filesystem, and the best I
 know until now is that it is because of union_allocvp putting NULL in
 (*ap->a_vpp) in (src/sys/miscfs/union/union_vnops.c,
                  union_lookup(...), about line 543):
 
         error = union_allocvp(ap->a_vpp, dvp->v_mount, dvp, upperdvp, cnp,
                               uppervp, lowervp, 1);
 
 which later triggers (src/sys/miscfs/union/union_vnops.c,
                       union_lookup(...), about line 573):
 
 #ifdef DIAGNOSTIC
         if (cnp->cn_namelen == 1 &&
             cnp->cn_nameptr[0] == '.' &&
             *ap->a_vpp != dvp) {
                 panic("union_lookup returning . (%p) not same as startdir (%p)", ap->a_vpp, dvp);
         }
 #endif
 
 But I'm not sure what exactly is wrong in or before union_allocvp, and
 right now I don't yet understand what's exactly going on in the code
 there (I'm not exactly sure what the DIAGNOSTIC marked code is doing
 there - what is it for, and why is this specific case special?, but I
 see union_lookup would just fail (and not panic) without it, so that's
 perhaps a workaround)...
 
 Can someone with more experience/understanding of the union filesystem
 take a look at this?
 
   Thanks,
   -- Tom
 </snip>
 
 -- 
   Tom Alsberg - hacker (being the best description fitting this space)
   Web page:	http://www.cs.huji.ac.il/~alsbergt/
 DISCLAIMER:  The above message does not even necessarily represent what
 my fingers have typed on the keyboard, save anything further.
 
 
 -- 
   Tom Alsberg - hacker (being the best description fitting this space)
   Web page:	http://www.cs.huji.ac.il/~alsbergt/
 DISCLAIMER:  The above message does not even necessarily represent what
 my fingers have typed on the keyboard, save anything further.

From: Peter Edwards <peter.edwards@openet-telecom.com>
To: freebsd-gnats-submit@FreeBSD.org, scrappy@hub.org
Cc: alsbergt@cs.huji.ac.il
Subject: Re: kern/53004: union_lookup returning . (0xbc332e90) not same as startdir (0xc1fa8a40)
Date: Thu, 31 Jul 2003 14:46:24 +0100

 Hi,
 
 The DIAGNOSTIC code is checking that if you are looking up entry "." in 
 directory "dir", then the  returned node should be the same as the one passed 
 in (ie, "." must be a hard link to the parent directory). You're looking up 
 "." in something that's not a directory, so the lookup has failed. In this 
 case there is no returned vnode, so the check is invalid.
 For Tom's example, error is definitely "ENOTDIR" at that point. Can you check 
 your core to see if this is definitely the case?
 
 Try adding error == 0 to the start of the "if" surrounding the panic:
 
 >        if (cnp->cn_namelen == 1 &&
 becomes
 >        if (error == 0 && cnp->cn_namelen == 1 &&
 
 I also figure that the a->a_vpp in the panic line should be *a->a_vpp, so you 
 can actually see the returned vnode, rather than the pointer to its 
 container, as it is, it's comparing apples [vnode *] to oranges [vnode **]
 
 Cheers,
 Peter.
 
Responsible-Changed-From-To: freebsd-bugs->das 
Responsible-Changed-By: das 
Responsible-Changed-When: Thu Nov 13 23:33:42 PST 2003 
Responsible-Changed-Why:  
Over to me. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53004 
State-Changed-From-To: open->patched 
State-Changed-By: das 
State-Changed-When: Fri Nov 14 00:24:52 PST 2003 
State-Changed-Why:  
Fix committed, awaiting MFC. 
src/sys/fs/unionfs/union_vnops.c,v 1.103 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53004 
State-Changed-From-To: patched->closed 
State-Changed-By: das 
State-Changed-When: Mon Jan 12 22:52:07 PST 2004 
State-Changed-Why:  
MFC'd.  Sorry for the delay. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53004 
>Unformatted:
