From Tor.Egge@idi.ntnu.no  Wed Apr 30 21:12:48 1997
Received: from pat.idt.unit.no (0@pat.idt.unit.no [129.241.103.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA06762
          for <FreeBSD-gnats-submit@freebsd.org>; Wed, 30 Apr 1997 21:12:36 -0700 (PDT)
Received: from ikke.idt.unit.no (tegge@ikke.idt.unit.no [129.241.111.65])
	by pat.idt.unit.no (8.8.5/8.8.5) with ESMTP id GAA01415
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 1 May 1997 06:12:27 +0200 (MET DST)
Received: (from tegge@localhost)
	by ikke.idt.unit.no (8.8.5/8.8.5) id GAA00773;
	Thu, 1 May 1997 06:12:27 +0200 (MET DST)
Message-Id: <199705010412.GAA00773@ikke.idt.unit.no>
Date: Thu, 1 May 1997 06:12:27 +0200 (MET DST)
From: Tor Egge <Tor.Egge@idi.ntnu.no>
Reply-To: Tor.Egge@idi.ntnu.no
To: FreeBSD-gnats-submit@freebsd.org
Subject: Use of NFS v3 might cause a trap 12
X-Send-Pr-Version: 3.2

>Number:         3434
>Category:       kern
>Synopsis:       Use of NFS v3 might cause a trap 12
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    steve
>State:          closed
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Apr 30 21:20:01 PDT 1997
>Closed-Date:    Sat Aug 23 09:39:24 PDT 1997
>Last-Modified:  Sat Aug 23 09:39:55 PDT 1997
>Originator:     Tor Egge
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
Norwegian University of Science and Technology, Trondheim, Norway
>Environment:

	FreeBSD ikke.idt.unit.no 3.0-CURRENT FreeBSD 3.0-CURRENT #1: Wed Apr 30 13:57:32 MET DST 1997     root@ikke.idt.unit.no:/usr/src/sys/compile/TEGGE_SMP  i386

>Description:

I've recently experienced a trap 12 in -current. A physical buffer was
added to the list of dirty buffers associated with an NFS vnode.


----
Current directory is /sys/compile/TEGGE_SMP/OLD/
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd), 
Copyright 1996 Free Software Foundation, Inc...
IdlePTD 274000
current pcb at 21e414
panic: from debugger
#0  boot (howto=256) at ../../kern/kern_shutdown.c:265
(kgdb) where
#0  boot (howto=256) at ../../kern/kern_shutdown.c:265
#1  0xe01196b6 in panic (fmt=0xe01013e9 "from debugger")
    at ../../kern/kern_shutdown.c:393
#2  0xe0101405 in db_panic (dummy1=-535226668, dummy2=0, dummy3=-1, 
    dummy4=0xe94a7cc0 "") at ../../ddb/db_command.c:440
#3  0xe01012f5 in db_command (last_cmdp=0xe0209bb4, cmd_table=0xe0209a04, 
    aux_cmd_tablep=0xe0235414) at ../../ddb/db_command.c:337
#4  0xe0101472 in db_command_loop () at ../../ddb/db_command.c:462
#5  0xe0103c93 in db_trap (type=12, code=0) at ../../ddb/db_trap.c:75
#6  0xe01c6d11 in kdb_trap (type=12, code=0, regs=0xe94a7e04)
    at ../../i386/i386/db_interface.c:140
#7  0xe01d45fb in trap_fatal (frame=0xe94a7e04) at ../../i386/i386/trap.c:749
#8  0xe01d4044 in trap_pfault (frame=0xe94a7e04, usermode=0)
    at ../../i386/i386/trap.c:661
#9  0xe01d3c23 in trap (frame={tf_es = 2147418128, tf_ds = 16, 
      tf_edi = -2023406815, tf_esi = 0, tf_ebp = -380993800, 
      tf_isp = -380994004, tf_ebx = 434176, tf_edx = 16, tf_ecx = 0, 
      tf_eax = -2023406815, tf_trapno = 12, tf_err = 0, tf_eip = -535226668, 
      tf_cs = 8, tf_eflags = 66198, tf_esp = -473724288, tf_ss = 2})
    at ../../i386/i386/trap.c:319
#10 0xe01916d4 in nfs_flush (vp=0xe3c38a80, cred=0xe25e2680, waitfor=2, 
    p=0xe3096e00, commit=1) at ../../nfs/nfs_vnops.c:2777
#11 0xe01915dd in nfs_fsync (ap=0xe94a7f30) at ../../nfs/nfs_vnops.c:2732
#12 0xe018281c in nfs_sync (mp=0xe321be00, waitfor=2, cred=0xe25e2680, 
    p=0xe3096e00) at vnode_if.h:479
#13 0xe013a0df in sync (p=0xe3096e00, uap=0x0, retval=0x0)
    at ../../kern/vfs_syscalls.c:480
#14 0xe0134c1b in vfs_update () at ../../kern/vfs_bio.c:1702
#15 0xe010e116 in kproc_start (udata=0xe020bc90) at ../../kern/init_main.c:249
#16 0xe01c7feb in fork_trampoline ()
#17 0xe01fcd78 in ahc_intr (arg=0x0) at ../../i386/scsi/aic7xxx.c:824
(kgdb) up 10
#10 0xe01916d4 in nfs_flush (vp=0xe3c38a80, cred=0xe25e2680, waitfor=2, 
    p=0xe3096e00, commit=1) at ../../nfs/nfs_vnops.c:2777
(kgdb) print *vp
$1 = {v_flag = 8192, v_usecount = 3, v_writecount = 1, v_holdcnt = 3, 
  v_lastr = 46, v_id = 194063, v_mount = 0xe321be00, v_op = 0xe2fe7300, 
  v_freelist = {tqe_next = 0xe321a880, tqe_prev = 0xe3d8faa0}, v_mntvnodes = {
    le_next = 0xe3c8ac80, le_prev = 0xe3c3cc28}, v_cleanblkhd = {
    lh_first = 0xe700d670}, v_dirtyblkhd = {lh_first = 0xe7046ed4}, 
  v_numoutput = 0, v_type = VREG, v_un = {vu_mountedhere = 0x0, 
    vu_socket = 0x0, vu_specinfo = 0x0, vu_fifoinfo = 0x0}, v_lease = 0x0, 
  v_lastw = 0, v_cstart = 0, v_lasta = 0, v_clen = 0, v_usage = 3, 
  v_object = 0xe4aa2b00, v_interlock = {lock_data = 0}, v_vnlock = 0xe3300bc0, 
  v_tag = VT_NFS, v_data = 0xe3da3d00}
(kgdb) print *vp->v_dirtyblkhd.lh_first
$2 = {b_hash = {le_next = 0xe703c4ac, le_prev = 0xe0222884}, b_vnbufs = {
    le_next = 0xe6ff3e3c, le_prev = 0xe3c38ab4}, b_freelist = {
    tqe_next = 0xe70569c4, tqe_prev = 0xe7034ab4}, b_act = {
    tqe_next = 0xe7043f90, tqe_prev = 0xe2fe9f94}, b_proc = 0x0, 
  b_flags = 553648786, b_qindex = 0, b_usecount = 4 '\004', b_error = 0, 
  b_bufsize = 8192, b_bcount = 8192, b_resid = 0, b_dev = 4294967295, b_un = {
    b_addr = 0xe7693000 "\020"}, b_kvabase = 0xe7693000 "\020", 
  b_kvasize = 8192, b_lblkno = 52, b_blkno = 832, b_iodone = 0, 
  b_iodone_chain = 0x0, b_vp = 0xe3c38a80, b_dirtyoff = 4096, 
  b_dirtyend = 8192, b_rcred = 0x0, b_wcred = 0xe49cf100, b_validoff = 4096, 
  b_validend = 8192, b_pblkno = 65840, b_saveaddr = 0x0, b_savekva = 0x0, 
  b_driver1 = 0x0, b_driver2 = 0x0, b_spc = 0x0, b_cluster = {cluster_head = {
      tqh_first = 0xe70569c4, tqh_last = 0xe7045ac4}, cluster_entry = {
      tqe_next = 0xe70569c4, tqe_prev = 0xe7045ac4}}, b_pages = {0xe081c580, 
    0xe080b44c, 0x0 <repeats 14 times>}, b_npages = 2}
(kgdb) print *vp->v_dirtyblkhd.lh_first->b_vnbufs.le_next
$3 = {b_hash = {le_next = 0x0, le_prev = 0x0}, b_vnbufs = {
    le_next = 0x87654321, le_prev = 0xe7046edc}, b_freelist = {
    tqe_next = 0xe6ff3d60, tqe_prev = 0xe6ff3690}, b_act = {tqe_next = 0x0, 
    tqe_prev = 0x0}, b_proc = 0x0, b_flags = 1610613268, b_qindex = 0, 
  b_usecount = 0 '\000', b_error = 0, b_bufsize = 65536, b_bcount = 65536, 
  b_resid = 0, b_dev = 5378, b_un = {
    b_addr = 0xe8810000 <Address 0xe8810000 out of bounds>}, 
  b_kvabase = 0xe8810000 <Address 0xe8810000 out of bounds>, 
  b_kvasize = 65536, b_lblkno = 176556, b_blkno = 18460672, 
  b_iodone = 0xe0136180 <cluster_callback>, b_iodone_chain = 0x0, b_vp = 0x0, 
  b_dirtyoff = 0, b_dirtyend = 65536, b_rcred = 0x0, b_wcred = 0x0, 
  b_validoff = 0, b_validend = 0, b_pblkno = 0, b_saveaddr = 0x0, 
  b_savekva = 0x0, b_driver1 = 0x0, b_driver2 = 0x0, b_spc = 0x0, b_cluster = {
    cluster_head = {tqh_first = 0xe7026c10, tqh_last = 0xe704c7e8}, 
    cluster_entry = {tqe_next = 0xe7026c10, tqe_prev = 0xe704c7e8}}, 
  b_pages = {0xe05bfde8, 0xe06c721c, 0xe0704150, 0xe0613984, 0xe06035b8, 
    0xe04d10ec, 0xe06b6220, 0xe0848554, 0xe04b7e88, 0xe0557fbc, 0xe078f1f0, 
    0xe0574724, 0xe05b3d58, 0xe058f48c, 0xe06c19c0, 0xe05149f4}, b_npages = 16}
(kgdb) print swbuf
$4 = (struct buf *) 0xe6fed7f8
(kgdb) print swbuf+nswbuf-1
$5 = (struct buf *) 0xe6ff451c
(kgdb) print nswbuf
$6 = 128
(kgdb) print vp->v_dirtyblkhd.lh_first->b_vnbufs.le_next-swbuf
$7 = 119
(kgdb) 
----

Too me, it looks like the following has happened:

	getnewbuf needs buffers, and calls vfs_bio_awrite to
	convert a delayed NFS write to an async write. 
	
	vfs_bio_awrite decides that a cluster write is possible,
	and calls cluster_wbuild.

	cluster_wbuild builds a cluster and calls bawrite.

	bawrite sets B_ASYNC, and calls VOP_BWRITE (i.e. nfs_bwrite)

	nfs_bwrite calls nfs_writebp.

	nfs_writebp calls VOP_STRATEGY (i.e. nfs_strategy)

	nfs_strategy calls nfs_doio.

	nfs_doio calls reassignbuf with the cluster as an argument.

>How-To-Repeat:

	Run three make worlds in parallell in chrooted environments
	(3.0-CURRENT, 2.2-STABLE and 2.1-STABLE). Throw in some
	extra jobs using a lot of memory, CPU and disk I/O.
	All of these jobs might reference local disks only.

	Perform some writing over NFS v3.
>Fix:
	
Are physical buffers (e.g. clusters) ever allowed to be on the
dirty or clean buffer lists associated with a vnode ?

If yes, then cluster_callback (or relpbuf) might need to remove the physical
buffer from the list.

If no, then reassignbuf or nfs_doio might need to check the B_CLUSTER
flag (e.g. don't put a cluster onto the clean/dirty lists or
don't use NFSV3WRITE_UNSTABLE as iomode for clusters).
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: steve 
State-Changed-When: Sat Aug 23 09:39:24 PDT 1997 
State-Changed-Why:  
PR should be closed as noted in misc/4028. 
>Unformatted:
