From nobody@FreeBSD.org  Fri Mar  2 18:11:45 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9CE1416A409
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  2 Mar 2007 18:11:45 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id 7647513C4B7
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  2 Mar 2007 18:11:45 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l22IBjZF012464
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 2 Mar 2007 18:11:45 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id l22IBjDp012463;
	Fri, 2 Mar 2007 18:11:45 GMT
	(envelope-from nobody)
Message-Id: <200703021811.l22IBjDp012463@www.freebsd.org>
Date: Fri, 2 Mar 2007 18:11:45 GMT
From: Andrew<andrew+pr2@supernews.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: deadlock in g_down -> ahd_action -> contigmalloc
X-Send-Pr-Version: www-3.0

>Number:         109762
>Category:       kern
>Synopsis:       [hang] deadlock in g_down -> ahd_action -> contigmalloc
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 02 18:20:04 GMT 2007
>Closed-Date:    
>Last-Modified:  Thu Aug 23 20:10:07 GMT 2007
>Originator:     Andrew
>Release:        FreeBSD 6.2-20070202
>Organization:
Critical Path, Inc
>Environment:
FreeBSD volcano.supernews.net 6.2-20070202 FreeBSD 6.2-20070202 #0: Fri Feb  2 16:29:10 UTC 2007     root@supernews.net:/usr/obj/usr/src/sys/SUPERNEWS  i386

>Description:
System hung during heavy file write activity (copying a large file between
filesystems). The cause of the hang was g_down being stuck as follows:

Tracing pid 4 tid 100016 td 0xa8303a80
sched_switch(a8303a80,0,1) at sched_switch+0x14b
mi_switch(1,0,a8303a80,ca4249c4,a0515704,...) at mi_switch+0x1ba
sleepq_switch(bc3927f0) at sleepq_switch+0x87
sleepq_wait(bc3927f0,0,a8303a80,44,bc3927f0,...) at sleepq_wait+0x5c
msleep(bc3927f0,a073c7a0,44,a06f43d5,0) at msleep+0x269
bwait(bc3927f0,44,a06f43d5,bc3927f0,0,...) at bwait+0x5f
swap_pager_putpages(ac2d318c,ca424ac4,1,1,ca424a90,...) at swap_pager_putpages+0x48c
default_pager_putpages(ac2d318c,ca424ac4,1,1,ca424a90) at default_pager_putpages+0x18
vm_pageout_flush(ca424ac4,1,1) at vm_pageout_flush+0xcb
vm_contig_launder_page(a49de288) at vm_contig_launder_page+0x2a6
vm_page_alloc_contig(3,0,0,ffffffff,8,0) at vm_page_alloc_contig+0x25c
contigmalloc(3000,a0710ea0,1,0,ffffffff,...) at contigmalloc+0x97
bus_dmamem_alloc(a83d3e80,ab998618,1,ab998610) at bus_dmamem_alloc+0xb4
ahd_alloc_scbs(a8415000) at ahd_alloc_scbs+0x17a
ahd_get_scb(a8415000,8) at ahd_get_scb+0x57
ahd_action(a83f82c0,ab8a4800) at ahd_action+0x103
xpt_run_dev_sendq(a83f8280) at xpt_run_dev_sendq+0x175
xpt_action(ab8a4800) at xpt_action+0x269
dastart(a862c600,ab8a4800,ab8a4800,a86254c0,1) at dastart+0x149
xpt_run_dev_allocq(a83f8280) at xpt_run_dev_allocq+0x82
xpt_schedule(a862c600,1,a9da5bdc,ca424ce8,a04bd420,...) at xpt_schedule+0xef
dastrategy(a9da5bdc) at dastrategy+0x4a
g_disk_start(a9da5528) at g_disk_start+0x18c
g_io_schedule_down(a8303a80) at g_io_schedule_down+0x13b
g_down_procbody(0,ca424d38) at g_down_procbody+0x92
fork_exit(a04bf00c,0,ca424d38) at fork_exit+0x71
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xca424d6c, ebp = 0 ---

Clearly, having g_down waiting for a swap pageout to complete is a deadlock.

The circumstances under which this happens are not particularly clear -
at the point of the hang, most of the system memory was in the 'inactive'
queue, but the amount of free and/or cached memory was substantial.

Machine has 4GB of RAM of which about 3.3GB is usable (i386, no PAE).
Inactive memory was about 2.2GB, cache 900M, free 5M.

This has been observed twice so far, though attempts to reproduce it in
a consistent fashion have failed and it seems to be relatively rare.


>How-To-Repeat:
Initiate a burst of heavy i/o via the ahd driver, such as copying multi-
gigabyte files or using dd to create same. The other conditions needed
for it to happen are not known.
>Fix:

>Release-Note:
>Audit-Trail:

From: Frank de Bot <ppi@searchy.net>
To: bug-followup@FreeBSD.org,  andrew+pr2@supernews.net
Cc:  
Subject: Re: kern/109762: [hang] deadlock in g_down -&gt; ahd_action -&gt;
 contigmalloc
Date: Thu, 23 Aug 2007 21:34:51 +0200

 I think I've stumbled on the same bug.
 
 I've managed to reproduce this problem. Although I have not been able to 
 extract a dump.
 
 Basic hardware information: 2x2.8ghz Prestiona Xeon, 2x256Mb DDR REG ECC 
 Ram, IO-Controller: Supermicro AOC-SAT2-MV8
 
 The System runs FreeBSD-6.2, compiled with most recent 
 FreeBSD-6.2-STABLE sources, world as well as kernel. Kernel is GENERIC 
 with SMP option enabled in it.
 
 The way I reproduce it:
 
 With gvinum a raid5 is created with the following definition:
 
 .--
 drive drive1 device /dev/ad6
 drive drive2 device /dev/ad8
 drive drive3 device /dev/ad10
 drive drive4 device /dev/ad12
 drive drive5 device /dev/ad14
 drive drive6 device /dev/ad16
 drive drive7 device /dev/ad18
 volume raid5_vol
          plex org raid5 512k
          sd length 476939m drive drive1
          sd length 476939m drive drive2
          sd length 476939m drive drive3
          sd length 476939m drive drive4
          sd length 476939m drive drive5
          sd length 476939m drive drive6
          sd length 476939m drive drive7
 .--
 
 Ran the command newfs /dev/gvinum/raid5_vol  , no options selected.
 After this I've ran iozone, with the following parameters:
 
 iozone -r 8 -r 64 -r 4096 -r 16384 -i 0 -i 1 -i 2 -Ra -g 4G > iozonerun.txt
 
 Ran in the newly created FS. I've ran iozone 3 times with the output 
 redirected to a file on the same FS, 3 times the system went rebooting. 
 The last time I tried to enable the creation of a kernel dump. The 
 following screen appeared:
 
 .--
 Fatal trap 12: page fault while in kernel mod
 cpuid = 0; apic id = 00
 fault virtual address  = 0x0
 fault code             = supervisor write, page not present
 instruction point      = 0x28:0x58089e8
 stack pointer          = 0x28:0xd44c2cdc
 frame pointer          = 0x28:0xd44c2ce8
 code segment           = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags       = interrupt enabled, resume, IOPL = 0
 current process        = 4 (g_down)
 trap number            = 12
 panic: page fault
 cpuid = 0
 Uptime: 21h9m55s
 Dumper 511 MB (2 chunks)
 swap_pager: indefinite wait in buffer: bufobj: 0, blkno: 62, size: 4096
 swap_pager: indefinite wait in buffer: bufobj: 0, blkno: 62, size: 4096
 swap_pager: indefinite wait in buffer: bufobj: 0, blkno: 62, size: 4096
 swap_pager: indefinite wait in buffer: bufobj: 0, blkno: 62, size: 4096
 swap_pager: indefinite wait in buffer: bufobj: 0, blkno: 62, size: 4096
 .--
 
 Last lime continued until system was turned off manually
 
 Between the 3 runs I had a run where I redirected the output to the 
 root-FS. The crash DID NOT occured.
 
 
 
 Regards,
 
 Frank de Bot
>Unformatted:
