From pm@zin.lublin.pl  Tue Mar  2 13:39:13 2004
Return-Path: <pm@zin.lublin.pl>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id AAA7216A4CE; Tue,  2 Mar 2004 13:39:13 -0800 (PST)
Received: from shellma.zin.lublin.pl (shellma.zin.lublin.pl [212.182.126.68])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 5CC6043D2D; Tue,  2 Mar 2004 13:39:12 -0800 (PST)
	(envelope-from pm@zin.lublin.pl)
Received: by shellma.zin.lublin.pl (Postfix, from userid 1018)
	id 216CB5F103; Tue,  2 Mar 2004 22:39:36 +0100 (CET)
Message-Id: <20040302213936.216CB5F103@shellma.zin.lublin.pl>
Date: Tue,  2 Mar 2004 22:39:36 +0100 (CET)
From: Pawe Maachowski <pawmal-posting@freebsd.lublin.pl>
Reply-To: Pawe Maachowski <pawmal-posting@freebsd.lublin.pl>
To: FreeBSD-gnats-submit@freebsd.org
Cc: stable@freebsd.org
Subject: Using read-only NULLFS leads to panic. gdb output included, easy to reproduce.
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         63662
>Category:       kern
>Synopsis:       [nullfs] using read-only NULLFS leads to panic. gdb output included, easy to reproduce.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 02 13:40:04 PST 2004
>Closed-Date:    Sun Mar 11 20:38:12 GMT 2007
>Last-Modified:  Sun Mar 11 20:38:12 GMT 2007
>Originator:     Pawel Malachowski
>Release:        FreeBSD 4.7-RELEASE-p25 i386
>Organization:
ZiN
>Environment:
RELENG_4

	
>Description:
I know NULLFS is documented as broken and incoming PRs are usually put
in suspended state, awaiting a patch.
However, there are people claiming that using NULLFS in read-only mode
is safe. It seems, they are wrong.

I'm not too familiar with debugging, however I decided to use my free
time and try to provide more than backtrace, in hope someone will take
a look at this for a while (maybe it is trivial to fix?).


Environmnet:
(A) FreeBSD 4.9-RELEASE, null.ko.
(B) FreeBSD 4.9-STABLE, NULLFS, almost GENERIC (+IPFIREWALL, IPFILTER...)
(C) FreeBSD 4.8-RELEASE, GENERIC, nullfs.ko (+ipfw.ko)

Original problem touched me on machine A:
% mount | grep -c 'null, local, read-only'
23
It usually comes at night, when cron is doing its job, especially
periodic tasks.

However, I took machine B (completly different, pure routing) and C
(GENERIC+debug), and successfully reproduced this crash with identical
backtrace this way:
mount_null -o ro /usr/ports /mnt/1
mount_null -o ro /usr/ports /mnt/2
mount_null -o ro /usr/ports /mnt/3
find /usr/ports -type f -perm -u+s &
find /usr/ports -type f -perm -u+s &
...
find /mnt/1 -type f -perm -u+s &
find /mnt/1 -type f -perm -u+s &
...
find /mnt/2 -type f -perm -u+s &
find /mnt/2 -type f -perm -u+s &
...

(Machine C crashed after few minutes).


(C)
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0255ab7
stack pointer           = 0x10:0xcbb38e90
frame pointer           = 0x10:0xcbb38ea4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 58363 (find)
interrupt mask          = none
trap number             = 12
panic: page fault

syncing disks... 65 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
giving up on 1 buffers
Uptime: 24d9h54m57s
(kgdb) add-symbol-file /sys/modules/nullfs/null.ko 0xC1424388
add symbol table from file "/sys/modules/nullfs/null.ko" at text_addr = 0xc1424388?
(y or n) y
Reading symbols from /sys/modules/nullfs/null.ko...done.
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc0227653 in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc0227a78 in poweroff_wait (junk=0xc0421bec, howto=-1069410545)
    at ../../kern/kern_shutdown.c:595
#3  0xc03a522e in trap_fatal (frame=0xcbb38e50, eva=4)
    at ../../i386/i386/trap.c:974
#4  0xc03a4f01 in trap_pfault (frame=0xcbb38e50, usermode=0, eva=4)
    at ../../i386/i386/trap.c:867
#5  0xc03a4abf in trap (frame={tf_fs = 65552, tf_es = 16842768,
      tf_ds = -877461488, tf_edi = -877520608, tf_esi = -875975552,
      tf_ebp = -877424988, tf_isp = -877425028, tf_ebx = 0, tf_edx = 6,
      tf_ecx = -877520608, tf_eax = -877520608, tf_trapno = 12, tf_err = 0,
      tf_eip = -1071293769, tf_cs = 8, tf_eflags = 66178, tf_esp = -1054023424,
      tf_ss = 58363}) at ../../i386/i386/trap.c:466
#6  0xc0255ab7 in vput (vp=0x0) at ../../kern/vfs_subr.c:1608
#7  0xc14252e2 in null_inactive (ap=0xcbb38ee4)
    at /usr/src/sys/modules/nullfs/../../miscfs/nullfs/null_vnops.c:728
#8  0xc0255a57 in vrele (vp=0xcbc9ac80) at vnode_if.h:815
#9  0xc0257e47 in fchdir (p=0xcbb21920, uap=0xcbb38f80)
    at ../../kern/vfs_syscalls.c:842
#10 0xc03a54dd in syscall2 (frame={tf_fs = 134545455, tf_es = 47,
      tf_ds = -1078001617, tf_edi = 134626560, tf_esi = 5, tf_ebp = -1077938908,
      tf_isp = -877424684, tf_ebx = 672079852, tf_edx = 134561920,
      tf_ecx = 672154432, tf_eax = 13, tf_trapno = 7, tf_err = 2,
      tf_eip = 671764044, tf_cs = 31, tf_eflags = 663, tf_esp = -1077939048,
      tf_ss = 47}) at ../../i386/i386/trap.c:1175
#11 0xc03962f5 in Xint0x80_syscall ()
#12 0x280a074d in ?? ()
(kgdb) frame 0
#0  dumpsys () at ../../kern/kern_shutdown.c:487
487             if (dumping++) {
(kgdb) up 6
#6  0xc0255ab7 in vput (vp=0x0) at ../../kern/vfs_subr.c:1608
1608            struct proc *p = curproc;       /* XXX */
(kgdb) l
1603
1604    void
1605    vput(vp)
1606            struct vnode *vp;
1607    {
1608            struct proc *p = curproc;       /* XXX */
1609
1610            KASSERT(vp != NULL, ("vput: null vp"));
1611
1612            simple_lock(&vp->v_interlock);
(kgdb) p vp
$1 = (struct vnode *) 0x0
(kgdb) up
#7  0xc14252e2 in null_inactive (ap=0xcbb38ee4)
    at /usr/src/sys/modules/nullfs/../../miscfs/nullfs/null_vnops.c:728
728             vput(lowervp);
(kgdb) l
723             if (vp->v_vnlock != NULL) {
724                     vp->v_vnlock = &xp->null_lock;  /* we no longer share the lock */
725             } else
726                     VOP_UNLOCK(vp, LK_THISLAYER, p);
727
728             vput(lowervp);
729             /*
730              * Now it is safe to drop references to the lower vnode.
731              * VOP_INACTIVE() will be called by vrele() if necessary.
732              */
(kgdb) p lowervp
$2 = (struct vnode *) 0x0
(kgdb) l -
713             struct vnode *vp = ap->a_vp;
714             struct proc *p = ap->a_p;
715             struct null_node *xp = VTONULL(vp);
716             struct vnode *lowervp = xp->null_lowervp;
717
718             lockmgr(&null_hashlock, LK_EXCLUSIVE, NULL, p);
719             LIST_REMOVE(xp, null_hash);
720             lockmgr(&null_hashlock, LK_RELEASE, NULL, p);
721
722             xp->null_lowervp = NULLVP;
(kgdb) p *xp
$4 = {null_lock = {lk_interlock = {lock_data = -1054640128}, lk_flags = 64,
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 8,
    lk_wmesg = 0xc142548d "nullnode", lk_timo = 0, lk_lockholder = -1},
  null_vnlock = 0x0, null_hash = {le_next = 0x0, le_prev = 0xc12c4de4},
  null_lowervp = 0x0, null_vnode = 0xcbc9ac80}
(kgdb) p xp->null_lowervp
$5 = (struct vnode *) 0x0
(kgdb) p vp
$7 = (struct vnode *) 0xcbc9ac80
(kgdb) p vp->v_data
$8 = (void *) 0xc12ce100
(kgdb) p (struct null_node) vp->v_data
$10 = {null_lock = {lk_interlock = {lock_data = -1054023424}, lk_flags = 0,
    lk_sharecount = 0, lk_waitcount = -875975424, lk_exclusivecount = -21376,
    lk_prio = -13367, lk_wmesg = 0x0, lk_timo = 0, lk_lockholder = 0},
  null_vnlock = 0x0, null_hash = {le_next = 0x0, le_prev = 0x0},
  null_lowervp = 0x0, null_vnode = 0x0}
(kgdb) p ((struct null_node)vp->v_data)->null_lowervp
$11 = (struct vnode *) 0x0
(kgdb) up
#9  0xc0257e47 in fchdir (p=0xcbb21920, uap=0xcbb38f80)
    at ../../kern/vfs_syscalls.c:842
842             vrele(fdp->fd_cdir);
(kgdb) l
837             if (error) {
838                     vput(vp);
839                     return (error);
840             }
841             VOP_UNLOCK(vp, 0, p);
842             vrele(fdp->fd_cdir);
843             fdp->fd_cdir = vp;
844             return (0);
845     }
846
(kgdb) p (struct null_node) fdp->fd_cdir->v_data
$16 = {null_lock = {lk_interlock = {lock_data = -1054023424}, lk_flags = 0,
    lk_sharecount = 0, lk_waitcount = -875975424, lk_exclusivecount = -21376,
    lk_prio = -13367, lk_wmesg = 0x0, lk_timo = 0, lk_lockholder = 0},
  null_vnlock = 0x0, null_hash = {le_next = 0x0, le_prev = 0x0},
  null_lowervp = 0x0, null_vnode = 0x0}
(kgdb) l fchdir
806     fchdir(p, uap)
807             struct proc *p;
808             struct fchdir_args /* {
809                     syscallarg(int) fd;
810             } */ *uap;
811     {
812             register struct filedesc *fdp = p->p_fd;
813             struct vnode *vp, *tdp;
814             struct mount *mp;
815             struct file *fp;
(kgdb) p (struct null_node) p->p_fd->fd_cdir->v_data
$20 = {null_lock = {lk_interlock = {lock_data = -1054023424}, lk_flags = 0,
    lk_sharecount = 0, lk_waitcount = -875975424, lk_exclusivecount = -21376,
    lk_prio = -13367, lk_wmesg = 0x0, lk_timo = 0, lk_lockholder = 0},
  null_vnlock = 0x0, null_hash = {le_next = 0x0, le_prev = 0x0},
  null_lowervp = 0x0, null_vnode = 0x0}
(kgdb) p *p
$22 = {p_procq = {tqe_next = 0xcbb20f60, tqe_prev = 0xc04a97d0}, p_list = {
    le_next = 0xcbb20f60, le_prev = 0xc04a9778}, p_cred = 0xc0f731e0,
  p_fd = 0xc10ee500, p_stats = 0xcbb36cd0, p_limit = 0xc11e9e00,
  p_upages_obj = 0xc049b5c0, p_procsig = 0xc1387880, p_flag = 16390,
  p_stat = 2 '\002', p_pad1 = "\000\000", p_pid = 58363, p_hash = {le_next = 0x0,
    le_prev = 0xc0a815ec}, p_pglist = {le_next = 0x0, le_prev = 0xc13ecc28},
  p_pptr = 0xcbb1fd80, p_sibling = {le_next = 0xcbb20f60, le_prev = 0xcbb1fdd0},
  p_children = {lh_first = 0x0}, p_ithandle = {callout = 0xc2befd50}, p_oppid = 0,
  p_dupfd = 0, p_vmspace = 0xcbb52880, p_estcpu = 295, p_cpticks = 75,
  p_pctcpu = 1182, p_wchan = 0x0, p_wmesg = 0xc04113ea "inode", p_swtime = 54,
  p_slptime = 0, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0},
    it_value = {tv_sec = 0, tv_usec = 0}}, p_runtime = 5487340, p_uu = 0,
  p_su = 136, p_iu = 0, p_uticks = 99, p_sticks = 2561, p_iticks = 7,
  p_traceflag = 0, p_tracep = 0x0, p_siglist = {__bits = {0, 0, 0, 0}},
  p_textvp = 0xcb96f300, p_lock = 0 '\000', p_oncpu = 0 '\000',
  p_lastcpu = 0 '\000', p_rqindex = 2 '\002', p_locks = -175, p_simple_locks = 0,
  p_stops = 0, p_stype = 0, p_step = 0 '\000', p_pfsflags = 0 '\000',
  p_pad3 = "\000", p_retval = {0, 134561920}, p_sigiolst = {slh_first = 0x0},
  p_sigparent = 20, p_oldsigmask = {__bits = {0, 0, 0, 0}}, p_sig = 0, p_code = 0,
  p_klist = {slh_first = 0x0}, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {
    ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_priority = 8 '\b',
  p_usrpri = 86 'V', p_nice = 0 '\000',
  p_comm = "find\000n\000\000\000\000\000\000\000\000\000\000",
  p_pgrp = 0xc13ecc20, p_sysent = 0xc044b420, p_rtprio = {type = 1, prio = 0},
  p_prison = 0x0, p_args = 0xc12dc300, p_addr = 0xcbb36000, p_md = {
    md_regs = 0xcbb38fa8}, p_xstat = 0, p_acflag = 2, p_ru = 0x0, p_nthreads = 0,
  p_aioinfo = 0x0, p_wakeup = 0, p_peers = 0x0, p_leader = 0xcbb21920, p_asleep = {
    as_priority = 0, as_timo = 0}, p_emuldata = 0x0}
(kgdb)

Why is null_lowervp NULL? It may be significant that problem
appears when I search non-null /usr/ports and null /mnt/x at
the same time.

It may be also interesting, on machine B there were about 30 find(1)
processess around once a time, and all of them stuck into inode state,
becoming zombie. Also new process were not able to go into /usr/ports
(`cd /usr/ports' -> frozen shell). After performing reboot(8) machine
failed to reboot because of these inode-state processess. Power-off/on
cycle was necessery...



Other panic messages:

(A, this _one_ is less common)
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc02766eb
stack pointer           = 0x10:0xe9589dd0
frame pointer           = 0x10:0xe9589de4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 80250 (cron)
interrupt mask          = none
trap number             = 12
panic: page fault

syncing disks... 28 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
giving up on 1 buffers
Uptime: 2d20h42m48s
(kgdb) add-symbol-file /sys/modules/nullfs/null.ko 0xC3811390
add symbol table from file "/sys/modules/nullfs/null.ko" at text_addr = 0xc3811390?
(y or n) y
Reading symbols from /sys/modules/nullfs/null.ko...done.
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc0247b4b in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc0247f70 in poweroff_wait (junk=0xc044a62c, howto=-1069244113)
    at ../../kern/kern_shutdown.c:595
#3  0xc03c2dba in trap_fatal (frame=0xe9589d90, eva=4)
    at ../../i386/i386/trap.c:974
#4  0xc03c2a8d in trap_pfault (frame=0xe9589d90, usermode=0, eva=4)
    at ../../i386/i386/trap.c:867
#5  0xc03c264b in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16,
      tf_edi = -388392512, tf_esi = -374358784, tf_ebp = -380068380,
      tf_isp = -380068420, tf_ebx = 0, tf_edx = 6, tf_ecx = -388392512,
      tf_eax = -388392512, tf_trapno = 12, tf_err = 0, tf_eip = -1071159573,
      tf_cs = 8, tf_eflags = 66182, tf_esp = -1007055424, tf_ss = 80250})
    at ../../i386/i386/trap.c:466
#6  0xc02766eb in vput (vp=0x0) at ../../kern/vfs_subr.c:1629
#7  0xc38122ea in null_inactive (ap=0xe9589e24)
    at /src/sys/modules/nullfs/../../miscfs/nullfs/null_vnops.c:728
#8  0xc027668b in vrele (vp=0xe9afbd00) at vnode_if.h:815
#9  0xc027cf23 in vn_close (vp=0xe9afbd00, flags=1, cred=0xc54d3100, p=0xe8d999c0)
    at ../../kern/vfs_vnops.c:235
#10 0xc027d843 in vn_closefile (fp=0xc4f78ac0, p=0xe8d999c0)
    at ../../kern/vfs_vnops.c:693
#11 0xc023d6c3 in fdrop (fp=0xc4f78ac0, p=0xe8d999c0) at ../../sys/file.h:218
#12 0xc023d60c in closef (fp=0xc4f78ac0, p=0xe8d999c0)
    at ../../kern/kern_descrip.c:1441
#13 0xc023c743 in close (p=0xe8d999c0, uap=0xe9589f80)
    at ../../kern/kern_descrip.c:623
#14 0xc03c3069 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
      tf_edi = 134574392, tf_esi = 1, tf_ebp = -1077941168, tf_isp = -380067884,
      tf_ebx = 672113388, tf_edx = 134574080, tf_ecx = 134574080, tf_eax = 6,
      tf_trapno = 12, tf_err = 2, tf_eip = 672066564, tf_cs = 31, tf_eflags = 643,
      tf_esp = -1077941212, tf_ss = 47}) at ../../i386/i386/trap.c:1175
#15 0xc03b40e5 in Xint0x80_syscall ()
#16 0x280df523 in ?? ()
(kgdb) up 6
#6  0xc02766eb in vput (vp=0x0) at ../../kern/vfs_subr.c:1629
1629            struct proc *p = curproc;       /* XXX */
(kgdb) l
1624
1625    void
1626    vput(vp)
1627            struct vnode *vp;
1628    {
1629            struct proc *p = curproc;       /* XXX */
1630
1631            KASSERT(vp != NULL, ("vput: null vp"));
1632
1633            simple_lock(&vp->v_interlock);
(kgdb) p vp
$1 = (struct vnode *) 0x0
(kgdb) p (struct null_node) vp->v_data
$2 = {null_lock = {lk_interlock = {lock_data = -1007055424}, lk_flags = 0,
    lk_sharecount = 0, lk_waitcount = -374358656, lk_exclusivecount = -17152,
    lk_prio = -5713, lk_wmesg = 0x0, lk_timo = 0, lk_lockholder = 0},
  null_vnlock = 0x0, null_hash = {le_next = 0x0, le_prev = 0x0},
  null_lowervp = 0x0, null_vnode = 0x0}
(kgdb) up
#9  0xc027cf23 in vn_close (vp=0xe9afbd00, flags=1, cred=0xc54d3100, p=0xe8d999c0)
    at ../../kern/vfs_vnops.c:235
235             vrele(vp);
(kgdb) l
230             int error;
231
232             if (flags & FWRITE)
233                     vp->v_writecount--;
234             error = VOP_CLOSE(vp, flags, cred, p);
235             vrele(vp);
236             return (error);
237     }
238
239     static __inline
(kgdb) up
#10 0xc027d843 in vn_closefile (fp=0xc4f78ac0, p=0xe8d999c0)
    at ../../kern/vfs_vnops.c:693
693             return (vn_close(((struct vnode *)fp->f_data), fp->f_flag,
(kgdb) l
688             struct file *fp;
689             struct proc *p;
690     {
691
692             fp->f_ops = &badfileops;
693             return (vn_close(((struct vnode *)fp->f_data), fp->f_flag,
694                     fp->f_cred, p));
695     }
696
697     static int
(kgdb) p (struct vnode) fp->f_data
$11 = {v_flag = 3920608512, v_usecount = 0, v_writecount = 0,
  v_holdcnt = 858863156, v_id = 0, v_mount = 0x0, v_op = 0xc34adbc8, v_freelist = {
    tqe_next = 0xc4fe7c00, tqe_prev = 0xc3fa04c8}, v_nmntvnodes = {tqe_next = 0x0,
    tqe_prev = 0xe9032180}, v_cleanblkhd = {tqh_first = 0xe905a680,
    tqh_last = 0xe9032100}, v_dirtyblkhd = {tqh_first = 0x33730a00,
    tqh_last = 0x6d373639}, v_synclist = {le_next = 0x67706a2e, le_prev = 0x0},
  v_numoutput = -385670912, v_type = VNON, v_un = {vu_mountedhere = 0x0,
    vu_socket = 0x0, vu_spec = {vu_specinfo = 0x0, vu_specnext = {
        sle_next = 0x67616d00}}, vu_fifoinfo = 0x0}, v_lease = 0x0,
  v_lastw = -1018520864, v_cstart = 0, v_lasta = -994427576, v_clen = -986729152,
  v_object = 0xc3e98450, v_interlock = {lock_data = -374519936}, v_vnlock = 0x0,
  v_tag = 1747847424, v_data = 0x63636174, v_cache_src = {lh_first = 0x737365},
  v_cache_dst = {tqh_first = 0x0, tqh_last = 0x0}, v_dd = 0x0,
  v_ddid = 1747873904, v_pollinfo = {vpi_lock = {lock_data = 1093599266},
    vpi_selinfo = {si_pid = 0, si_note = {slh_first = 0xc3a6fd00},
      si_flags = 4352}, vpi_events = -28088, vpi_revents = -15367}, v_vxproc = 0x0}
(kgdb)  p (struct null_node)((struct vnode) fp->f_data)->v_data
$13 = {null_lock = {lk_interlock = {lock_data = 1667457396}, lk_flags = 7566181,
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 0,
    lk_wmesg = 0x682e7070 <Address 0x682e7070 out of bounds>,
    lk_timo = 1093599266, lk_lockholder = 0}, null_vnlock = 0xc3a6fd00,
  null_hash = {le_next = 0xc4b71100, le_prev = 0xc3f99248}, null_lowervp = 0x0,
  null_vnode = 0xe9baaf80}
(kgdb) up
#12 0xc023d60c in closef (fp=0xc4f78ac0, p=0xe8d999c0)
    at ../../kern/kern_descrip.c:1441
1441            return (fdrop(fp, p));
(kgdb) l
1436                                            wakeup(fdtol);
1437                                    }
1438                            }
1439                    }
1440            }
1441            return (fdrop(fp, p));
1442    }
1443
1444    int
1445    fdrop(fp, p)
(kgdb)  p (struct null_node)((struct vnode) fp->f_data)->v_data
$15 = {null_lock = {lk_interlock = {lock_data = 1667457396}, lk_flags = 7566181,
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 0,
    lk_wmesg = 0x682e7070 <Address 0x682e7070 out of bounds>,
    lk_timo = 1093599266, lk_lockholder = 0}, null_vnlock = 0xc3a6fd00,
  null_hash = {le_next = 0xc4b71100, le_prev = 0xc3f99248}, null_lowervp = 0x0,
  null_vnode = 0xe9baaf80}
(kgdb) up
#13 0xc023c743 in close (p=0xe8d999c0, uap=0xe9589f80)
    at ../../kern/kern_descrip.c:623
623             error = closef(fp, p);
(kgdb) l
618                     fdp->fd_lastfile--;
619             if (fd < fdp->fd_freefile)
620                     fdp->fd_freefile = fd;
621             if (fd < fdp->fd_knlistsize)
622                     knote_fdclose(p, fd);
623             error = closef(fp, p);
624             if (holdleaders) {
625                     fdp->fd_holdleaderscount--;
626                     if (fdp->fd_holdleaderscount == 0 &&
627                         fdp->fd_holdleaderswakeup != 0) {
(kgdb)  p (struct null_node)((struct vnode) fp->f_data)->v_data
$18 = {null_lock = {lk_interlock = {lock_data = 1667457396}, lk_flags = 7566181,
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 0,
    lk_wmesg = 0x682e7070 <Address 0x682e7070 out of bounds>,
    lk_timo = 1093599266, lk_lockholder = 0}, null_vnlock = 0xc3a6fd00,
  null_hash = {le_next = 0xc4b71100, le_prev = 0xc3f99248}, null_lowervp = 0x0,
  null_vnode = 0xe9baaf80}
(kgdb) up
#14 0xc03c3069 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
      tf_edi = 134574392, tf_esi = 1, tf_ebp = -1077941168, tf_isp = -380067884,
      tf_ebx = 672113388, tf_edx = 134574080, tf_ecx = 134574080, tf_eax = 6,
      tf_trapno = 12, tf_err = 2, tf_eip = 672066564, tf_cs = 31, tf_eflags = 643,
      tf_esp = -1077941212, tf_ss = 47}) at ../../i386/i386/trap.c:1175
1175            error = (*callp->sy_call)(p, args);



(A)
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc02766eb
stack pointer           = 0x10:0xe8dcfe90
frame pointer           = 0x10:0xe8dcfea4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 91056 (find)
interrupt mask          = none
trap number             = 12
panic: page fault

syncing disks... 73 27 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
giving up on 1 buffers
Uptime: 5d8h9m51s
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc0247b4b in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc0247f70 in poweroff_wait (junk=0xc044a62c, howto=-1069244113)
    at ../../kern/kern_shutdown.c:595
#3  0xc03c2dba in trap_fatal (frame=0xe8dcfe50, eva=4)
    at ../../i386/i386/trap.c:974
#4  0xc03c2a8d in trap_pfault (frame=0xe8dcfe50, usermode=0, eva=4)
    at ../../i386/i386/trap.c:867
#5  0xc03c264b in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16,
      tf_edi = -388593440, tf_esi = -373419328, tf_ebp = -388170076,
      tf_isp = -388170116, tf_ebx = 0, tf_edx = 6, tf_ecx = -388593440,
      tf_eax = -388593440, tf_trapno = 12, tf_err = 0, tf_eip = -1071159573,
      tf_cs = 8, tf_eflags = 66178, tf_esp = -1013564992, tf_ss = 91056})
    at ../../i386/i386/trap.c:466
#6  0xc02766eb in vput (vp=0x0) at ../../kern/vfs_subr.c:1629
#7  0xc38262ea in ?? ()
#8  0xc027668b in vrele (vp=0xe9be12c0) at vnode_if.h:815
#9  0xc0278a83 in fchdir (p=0xe8d688e0, uap=0xe8dcff80)
    at ../../kern/vfs_syscalls.c:843
#10 0xc03c3069 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
      tf_edi = 134623232, tf_esi = 5, tf_ebp = -1077937660, tf_isp = -388169772,
      tf_ebx = 672080620, tf_edx = 134557696, tf_ecx = 672155200, tf_eax = 13,
      tf_trapno = 7, tf_err = 2, tf_eip = 671764800, tf_cs = 31, tf_eflags = 659,
      tf_esp = -1077937800, tf_ss = 47}) at ../../i386/i386/trap.c:1175
#11 0xc03b40e5 in Xint0x80_syscall ()
#12 0x280a0a41 in ?? ()




(B)
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc02766eb
stack pointer           = 0x10:0xe8dcfe90
frame pointer           = 0x10:0xe8dcfea4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 91056 (find)
interrupt mask          = none
trap number             = 12
panic: page fault
syncing disks... 73 27 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
giving up on 1 buffers
Uptime: 5d8h9m51s



(B)
instruction pointer     = 0x8:0xc0269bc7
stack pointer           = 0x10:0xd5d45e90
frame pointer           = 0x10:0xd5d45ea4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 558 (find)
interrupt mask          = none
trap number             = 12
panic: page fault
syncing disks... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
giving up on 1 buffers
Uptime: 21m42s


>How-To-Repeat:

mount_null -o ro /usr/ports /mnt/1
mount_null -o ro /usr/ports /mnt/2
mount_null -o ro /usr/ports /mnt/3
find /usr/ports -type f -perm -u+s &
find /usr/ports -type f -perm -u+s &
...
find /mnt/1 -type f -perm -u+s &
find /mnt/1 -type f -perm -u+s &
...
find /mnt/2 -type f -perm -u+s &
find /mnt/2 -type f -perm -u+s &
...

>Fix:

Unknown.

>Release-Note:
>Audit-Trail:

From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: =?iso-8859-2?Q?Pawe=B3_Ma=B3achowski?= <pawmal-posting@freebsd.lublin.pl>
Cc: FreeBSD-gnats-submit@FreeBSD.org, stable@FreeBSD.org
Subject: Re: kern/63662: Using read-only NULLFS leads to panic. gdb output included, easy to reproduce.
Date: Mon, 8 Mar 2004 00:22:40 +0100

 --DWqF7Vcvgq9cBwZ0
 Content-Type: text/plain; charset=iso-8859-2
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Mar 02, 2004 at 10:39:36PM +0100, Pawe=B3 Ma=B3achowski wrote:
 +> >Synopsis:       Using read-only NULLFS leads to panic. gdb output inclu=
 ded, easy to reproduce.
 
 I'm not able to reproduce it on -CURRENT.
 tjr@ was working on nullfs problems (mostly deadlocks), but all his work
 was done in -CURRENT AFAIR. I'm afraid that noone will dare to touch
 nullfs in 4.x.
 
 --=20
 Pawel Jakub Dawidek                       http://www.FreeBSD.org
 pjd@FreeBSD.org                           http://garage.freebsd.pl
 FreeBSD committer                         Am I Evil? Yes, I Am!
 
 --DWqF7Vcvgq9cBwZ0
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.4 (FreeBSD)
 
 iD8DBQFAS67AForvXbEpPzQRAtPWAJsHto/1b4RkufWDPMXcWAgGPXGXfACeLtlt
 ysxDOgaAGZAG6771k85JzH4=
 =jpsb
 -----END PGP SIGNATURE-----
 
 --DWqF7Vcvgq9cBwZ0--
State-Changed-From-To: open->patched 
State-Changed-By: tjr 
State-Changed-When: Mon Mar 8 16:05:23 PST 2004 
State-Changed-Why:  
Fixed in 5.2 and current. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=63662 
Responsible-Changed-From-To: freebsd-bugs->tjr 
Responsible-Changed-By: tjr 
Responsible-Changed-When: Mon Mar 8 16:05:48 PST 2004 
Responsible-Changed-Why:  
I will MFC the fix. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=63662 

Adding to audit trail from misfiled PRs 63669, 63684, 63694, 63704,
63918, 63920, 63959, and 64052, possibly out of order with the
above status:

 [tjr]:
 There are known bugs in nullfs in all 4.x releases to date, and in 5.0.
 If I have time, I may MFC the fixes some time before 4.10 is released.
 Can you reproduce these problems on 5.1 or 5.2?

 Procedure:
 mount_null -o ro /usr/ports /mnt/1
 find /usr/ports -type f -perm -u+s &
 find /mnt/1 -type f -perm -u+s &
 ...
 <about 30-40 processess of find(1) was enough>

 [pm]:
 I was not able to reproduce panic on 5.1-RELEASE system.

 However, I've reproduced second problem described in this PR,
 which happened to me on 4.9-STABLE once.
 *Every* time (I've tried four times) 5.1-RELEASE becomes unusable
 and non-rebootable, all find(1) processes stuck into 'ufs' state,
 CPU is 100% idle, no disk activity, other processess, when trying
 to read from filesystem, become frozen. I'm able to switch beetween
 consoles and to observe top(1) (started earlier). Ctrl+Alt+Del
 does not work.

 One mount_null is sufficient, but more than one null mount seems
 (or it was an accident) to decrease the time we have to wait.

 [tjr]:
 I'm sorry - the commit that fixed the deadlock issue you're describing
 here ocurred between 5.1 and 5.2, not between 5.0 and 5.1 like I
 had previously thought. Try 5.2, or try this patch on 5.1. If it solves
 your problem, I'll backport it to 4.x for you later this week.


 Fix two bugs causing possible deadlocks or panics, and one nit:
 - Emulate lock draining (LK_DRAIN) in null_lock() to avoid deadlocks
   when the vnode is being recycled.
 - Don't allow null_nodeget() to return a nullfs vnode from the wrong
   mount when multiple nullfs's are mounted. It's unclear why these checks
   were removed in null_subr.c 1.35, but they are definitely necessary.
   Without the checks, trying to unmount a nullfs mount will erroneously
   return EBUSY, and forcibly unmounting with -f will cause a panic.
 - Bump LOG2_SIZEVNODE up to 8, since vnodes are >256 bytes now. The old
   value (7) didn't cause any problems, but made the hash algorithm
   suboptimal.

 These changes fix nullfs enough that a parallel buildworld succeeds.

 Submitted by:   tegge (partially; LK_DRAIN)
 Tested by:      kris

 Index: null.h
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/nullfs/null.h,v
 retrieving revision 1.18
 diff -u -r1.18 null.h
 --- null.h	13 Jun 2002 21:49:09 -0000	1.18
 +++ null.h	3 Mar 2004 13:28:18 -0000
 @@ -35,7 +35,7 @@
   *
   *	@(#)null.h	8.3 (Berkeley) 8/20/94
   *
 - * $FreeBSD: src/sys/fs/nullfs/null.h,v 1.18 2002/06/13 21:49:09 semenu Exp $
 + * $FreeBSD: src/sys/fs/nullfs/null.h,v 1.19 2003/06/17 08:52:45 tjr Exp $
   */
  
  struct null_mount {
 @@ -51,6 +51,8 @@
  	LIST_ENTRY(null_node)	null_hash;	/* Hash list */
  	struct vnode	        *null_lowervp;	/* VREFed once */
  	struct vnode		*null_vnode;	/* Back pointer */
 +	int			null_pending_locks;
 +	int			null_drain_wakeup;
  };
  
  #define	MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data))
 Index: null_subr.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/nullfs/null_subr.c,v
 retrieving revision 1.40
 diff -u -r1.40 null_subr.c
 --- null_subr.c	19 Feb 2003 05:47:18 -0000	1.40
 +++ null_subr.c	3 Mar 2004 13:28:18 -0000
 @@ -35,7 +35,7 @@
   *
   *	@(#)null_subr.c	8.7 (Berkeley) 5/14/95
   *
 - * $FreeBSD: src/sys/fs/nullfs/null_subr.c,v 1.40 2003/02/19 05:47:18 imp Exp $
 + * $FreeBSD: src/sys/fs/nullfs/null_subr.c,v 1.41 2003/06/17 08:52:45 tjr Exp $
   */
  
  #include <sys/param.h>
 @@ -50,7 +50,7 @@
  
  #include <fs/nullfs/null.h>
  
 -#define LOG2_SIZEVNODE 7		/* log2(sizeof struct vnode) */
 +#define LOG2_SIZEVNODE 8		/* log2(sizeof struct vnode) */
  #define	NNULLNODECACHE 16
  
  /*
 @@ -71,8 +71,8 @@
  static MALLOC_DEFINE(M_NULLFSHASH, "NULLFS hash", "NULLFS hash table");
  MALLOC_DEFINE(M_NULLFSNODE, "NULLFS node", "NULLFS vnode private part");
  
 -static struct vnode * null_hashget(struct vnode *);
 -static struct vnode * null_hashins(struct null_node *);
 +static struct vnode * null_hashget(struct mount *, struct vnode *);
 +static struct vnode * null_hashins(struct mount *, struct null_node *);
  
  /*
   * Initialise cache headers
 @@ -103,7 +103,8 @@
   * Lower vnode should be locked on entry and will be left locked on exit.
   */
  static struct vnode *
 -null_hashget(lowervp)
 +null_hashget(mp, lowervp)
 +	struct mount *mp;
  	struct vnode *lowervp;
  {
  	struct thread *td = curthread;	/* XXX */
 @@ -121,9 +122,20 @@
  loop:
  	mtx_lock(&null_hashmtx);
  	LIST_FOREACH(a, hd, null_hash) {
 -		if (a->null_lowervp == lowervp) {
 +		if (a->null_lowervp == lowervp && NULLTOV(a)->v_mount == mp) {
  			vp = NULLTOV(a);
  			mtx_lock(&vp->v_interlock);
 +			/*
 +			 * Don't block if nullfs vnode is being recycled.
 +			 * We already hold a lock on the lower vnode, thus
 +			 * waiting might deadlock against the thread
 +			 * recycling the nullfs vnode or another thread
 +			 * in vrele() waiting for the vnode lock.
 +			 */
 +			if ((vp->v_iflag & VI_XLOCK) != 0) {
 +				VI_UNLOCK(vp);
 +				continue;
 +			}
  			mtx_unlock(&null_hashmtx);
  			/*
  			 * We need vget for the VXLOCK
 @@ -145,7 +157,8 @@
   * node found.
   */
  static struct vnode *
 -null_hashins(xp)
 +null_hashins(mp, xp)
 +	struct mount *mp;
  	struct null_node *xp;
  {
  	struct thread *td = curthread;	/* XXX */
 @@ -157,9 +170,21 @@
  loop:
  	mtx_lock(&null_hashmtx);
  	LIST_FOREACH(oxp, hd, null_hash) {
 -		if (oxp->null_lowervp == xp->null_lowervp) {
 +		if (oxp->null_lowervp == xp->null_lowervp &&
 +		    NULLTOV(oxp)->v_mount == mp) {
  			ovp = NULLTOV(oxp);
  			mtx_lock(&ovp->v_interlock);
 +			/*
 +			 * Don't block if nullfs vnode is being recycled.
 +			 * We already hold a lock on the lower vnode, thus
 +			 * waiting might deadlock against the thread
 +			 * recycling the nullfs vnode or another thread
 +			 * in vrele() waiting for the vnode lock.
 +			 */
 +			if ((ovp->v_iflag & VI_XLOCK) != 0) {
 +				VI_UNLOCK(ovp);
 +				continue;
 +			}
  			mtx_unlock(&null_hashmtx);
  			if (vget(ovp, LK_EXCLUSIVE | LK_THISLAYER | LK_INTERLOCK, td))
  				goto loop;
 @@ -192,7 +217,7 @@
  	int error;
  
  	/* Lookup the hash firstly */
 -	*vpp = null_hashget(lowervp);
 +	*vpp = null_hashget(mp, lowervp);
  	if (*vpp != NULL) {
  		vrele(lowervp);
  		return (0);
 @@ -222,6 +247,8 @@
  
  	xp->null_vnode = vp;
  	xp->null_lowervp = lowervp;
 +	xp->null_pending_locks = 0;
 +	xp->null_drain_wakeup = 0;
  
  	vp->v_type = lowervp->v_type;
  	vp->v_data = xp;
 @@ -244,7 +271,7 @@
  	 * Atomically insert our new node into the hash or vget existing 
  	 * if someone else has beaten us to it.
  	 */
 -	*vpp = null_hashins(xp);
 +	*vpp = null_hashins(mp, xp);
  	if (*vpp != NULL) {
  		vrele(lowervp);
  		VOP_UNLOCK(vp, LK_THISLAYER, td);
 Index: null_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/nullfs/null_vnops.c,v
 retrieving revision 1.62
 diff -u -r1.62 null_vnops.c
 --- null_vnops.c	3 Mar 2003 19:15:38 -0000	1.62
 +++ null_vnops.c	3 Mar 2004 13:28:26 -0000
 @@ -40,7 +40,7 @@
   *	...and...
   *	@(#)null_vnodeops.c 1.20 92/07/07 UCLA Ficus project
   *
 - * $FreeBSD: src/sys/fs/nullfs/null_vnops.c,v 1.62 2003/03/03 19:15:38 njl Exp $
 + * $FreeBSD: src/sys/fs/nullfs/null_vnops.c,v 1.63 2003/06/17 08:52:45 tjr Exp $
   */
  
  /*
 @@ -592,6 +592,7 @@
  	struct thread *td = ap->a_td;
  	struct vnode *lvp;
  	int error;
 +	struct null_node *nn;
  
  	if (flags & LK_THISLAYER) {
  		if (vp->v_vnlock != NULL) {
 @@ -614,13 +615,65 @@
  		 * going away doesn't mean the struct lock below us is.
  		 * LK_EXCLUSIVE is fine.
  		 */
 +		if ((flags & LK_INTERLOCK) == 0) {
 +			VI_LOCK(vp);
 +			flags |= LK_INTERLOCK;
 +		}
 +		nn = VTONULL(vp);
  		if ((flags & LK_TYPE_MASK) == LK_DRAIN) {
  			NULLFSDEBUG("null_lock: avoiding LK_DRAIN\n");
 -			return(lockmgr(vp->v_vnlock,
 -				(flags & ~LK_TYPE_MASK) | LK_EXCLUSIVE,
 -				&vp->v_interlock, td));
 +			/*
 +			 * Emulate lock draining by waiting for all other
 +			 * pending locks to complete.  Afterwards the
 +			 * lockmgr call might block, but no other threads
 +			 * will attempt to use this nullfs vnode due to the
 +			 * VI_XLOCK flag.
 +			 */
 +			while (nn->null_pending_locks > 0) {
 +				nn->null_drain_wakeup = 1;
 +				msleep(&nn->null_pending_locks,
 +				       VI_MTX(vp),
 +				       PVFS,
 +				       "nuldr", 0);
 +			}
 +			error = lockmgr(vp->v_vnlock,
 +					(flags & ~LK_TYPE_MASK) | LK_EXCLUSIVE,
 +					VI_MTX(vp), td);
 +			return error;
 +		}
 +		nn->null_pending_locks++;
 +		error = lockmgr(vp->v_vnlock, flags, &vp->v_interlock, td);
 +		VI_LOCK(vp);
 +		/*
 +		 * If we're called from vrele then v_usecount can have been 0
 +		 * and another process might have initiated a recycle 
 +		 * operation.  When that happens, just back out.
 +		 */
 +		if (error == 0 && (vp->v_iflag & VI_XLOCK) != 0 &&
 +		    td != vp->v_vxproc) {
 +			lockmgr(vp->v_vnlock,
 +				(flags & ~LK_TYPE_MASK) | LK_RELEASE,
 +				VI_MTX(vp), td);
 +			VI_LOCK(vp);
 +			error = ENOENT;
 +		}
 +		nn->null_pending_locks--;
 +		/*
 +		 * Wakeup the process draining the vnode after all
 +		 * pending lock attempts has been failed.
 +		 */
 +		if (nn->null_pending_locks == 0 &&
 +		    nn->null_drain_wakeup != 0) {
 +			nn->null_drain_wakeup = 0;
 +			wakeup(&nn->null_pending_locks);
 +		}
 +		if (error == ENOENT && (vp->v_iflag & VI_XLOCK) != 0 &&
 +		    vp->v_vxproc != curthread) {
 +			vp->v_iflag |= VI_XWANT;
 +			msleep(vp, VI_MTX(vp), PINOD, "nulbo", 0);
  		}
 -		return(lockmgr(vp->v_vnlock, flags, &vp->v_interlock, td));
 +		VI_UNLOCK(vp);
 +		return error;
  	} else {
  		/*
  		 * To prevent race conditions involving doing a lookup

 [pm]:
 I've installed 5.2.1-RC2 in place of 5.1-RELEASE on my test box.

 I was not able to reproduce this problem anymore.

 [Helge Oldach <helge.oldach@atosorigin.com>]:
 Did I miss this MFC, or haven't you back-ported it yet?

 Thanks!

 Helge

 [tjr]:
 Sorry, I got distracted by work, university, and a new AMD64 system.
 
 Here's the patch against RELENG_4:
 http://people.freebsd.org/~tjr/nullfs-4.diff

 It works, but I'm not going to commit it until I've had time to
 check a few things first.

 [Dmitry Morozovsky <marck@rinet.ru>]:
 Thanks a lot for your patch! We'll test it in near future, but in the mean
 time: I see your patch is attacking similar bugs in a different way as
 Matt/Cameron in

 Date:      Thu, 14 Nov 2002 11:29:26 -0800 (PST)
 From:      Matthew Dillon <dillon@apollo.backplane.com>
 To:        "Cameron Grant" <gandalf@vilnya.demon.co.uk>, freebsd-hackers@freebsd.org
 Subject:   Patch #6 (Re: Shared files within a jail)

 Did you ever evaluate their patch? I did use it with r/o nullfs with rather
 good results...
Responsible-Changed-From-To: tjr->freebsd-bugs 
Responsible-Changed-By: tjr 
Responsible-Changed-When: Fri Apr 23 06:21:39 PDT 2004 
Responsible-Changed-Why:  
I no longer have resources to support RELENG_4. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=63662 
State-Changed-From-To: patched->closed 
State-Changed-By: remko 
State-Changed-When: Sun Mar 11 20:38:11 UTC 2007 
State-Changed-Why:  
MFC's had been done to all relevant branches, close the ticket. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=63662 
>Unformatted:
