From nobody@FreeBSD.org  Sat Jan 22 22:54:06 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C26DB1065670
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jan 2011 22:54:06 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id B0C0D8FC16
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jan 2011 22:54:06 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p0MMs65C032757
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jan 2011 22:54:06 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p0MMs6BZ032756;
	Sat, 22 Jan 2011 22:54:06 GMT
	(envelope-from nobody)
Message-Id: <201101222254.p0MMs6BZ032756@red.freebsd.org>
Date: Sat, 22 Jan 2011 22:54:06 GMT
From: Carl <k0802647@telus.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: md getting stuck in wdrain state
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         154228
>Category:       kern
>Synopsis:       [md] md getting stuck in wdrain state
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-fs
>State:          patched
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 22 23:00:21 UTC 2011
>Closed-Date:    
>Last-Modified:  Fri May 06 08:19:08 UTC 2011
>Originator:     Carl
>Release:        FreeBSD-8.1-RELEASE-amd64
>Organization:
>Environment:
FreeBSD xxxxxxxx 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
If I try to observe a 'dd' process in action whilst using it to generate a file inside a particular file-backed memory device, I end up with unkillable hung processes. It is at least faintly reminiscent of this old report:

  http://www.mail-archive.com/freebsd-stable@freebsd.org/msg80511.html

and may be related to bug reports kern/45558 and kern/127420, neither of which appear to have ever been dealt with.

My scenario goes like this. I have a disk image in a large sparse file (60GiB apparent, 28GiB used). The image is taken from an MBR-sliced SSD containing one 34GiB slice housing a bsdlabel. The bsdlabel contains 1 swap and 5 UFS partitions. With the aid of mdconfig, I am mounting only one of the UFS partitions to /media. That partition is 1GiB in size and happens to consist of few or no sparse blocks. All I am trying to do is to zero that partition's unused space with the following:

  dd if=/dev/zero of=/media/zero bs=1M

Because this process seems to be quite slow, I switch to another window (I'm using 'screen') and run "ls /media" or "df". Both of these commands and any other commands I issue that would reference the file-backed memory device in question will immediately hang and become unkillable. The 'dd' process is also hung and unkillable. I have no recourse but to do an undignified reboot because the system as a whole hangs when I try to shut it down. This happens every time with that particular disk image file on this particular host.

The host is running FreeBSD-8.1-RELEASE (amd64) on an Intel Xeon E3110 with 4GiB DRAM, a matched pair of Seagate Constellation ES hard drives, GPT partitions which are gmirrored, and gjournalled UFS2 file systems. It is remote and used by others too, so hanging it is a bad thing.

Refer to "How to repeat the problem" for a test script I wrote which did reproduce the failure once. Here's the relevant process stats after the last time that script hung:

# ps -axl | egrep 'me\dia|ST\AT'
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
    0  7472  7398   0  51  0  7856  2096 wdrain D+     2    0:00.81 dd if=/dev/zero of=/media/zero bs=1M
    0  7509  7398   0  76  0  8224  1576 suspfs DE+    2    0:00.00 ls /media

I ran the script in a loop for 12 hours on a different FreeBSD-8.1-RELEASE-i386 host equipped with Intel Celeron 1.06GHz and 1GiB DRAM, but that system has yet to fail. This second host is obviously very much slower hardware, has a single Intel X25-V G2 SSD with no gjournalling, and is essentially idle.

The same script was also run for about an hour without failure on another old Pentium 4 3GHz with 2GiB DRAM and FreeBSD-8.1-RELEASE-i386, a single hard disk and again no gjournalling or gmirror.

I do not have a second FreeBSD-8.1-RELEASE-amd64 host on which to test this.

I am hoping others can reproduce the problem using the above script or some variation on the concept.

Carl                                             / K0802647

>How-To-Repeat:
In an effort to make the problem reproducible for reporting purposes, I tried to devise a script that would approximate my situation. I created the following script that did eventually fail after running it numerous times on the same amd64 host, but it usually runs to completion successfully, unlike my original scenario. This suggests a timing sensitive bug. Because the failure rate is low with this script and I must email someone at the remote site to forcibly reboot the machine once these processes become unkillable, I have been unable to figure out further simplifications, although I am sure there would be quite a few:

---------- begin script ----------

#!/bin/sh -ve
truncate -s 1G img.img
mdconfig -f img.img -S 512 -y 16 -x 63 -u 11
gpart create -s MBR md11
gpart add -t freebsd md11
# I expect making the image bootable should be unnecessary.
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 md11
gpart set -a active -i 1 md11
bsdlabel -w md11s1
bsdlabel md11s1 | sed -e '/^ *a:/s/unused/4.2BSD/' > /tmp/b.l
bsdlabel -R md11s1 /tmp/b.l ; rm /tmp/b.l
newfs /dev/md11s1a
# The next 2 lines are weird and probably unnecessary,
# but it is the original scenario.
mdconfig -d -u 11
mdconfig -f img.img -S 512 -y 255 -x 63 -u 11
mount /dev/md11s1a /media || exit
df -h | egrep 'Size|md11'
dd if=/dev/zero of=/media/zero bs=1M &
ps -axl | egrep 'ST\AT|d\d if' || true
while jobid > /dev/null
do
sleep 1
ls /media > /dev/null
df -h | egrep 'md11'
done
ps -axl | egrep 'ST\AT|d\d if' || true
umount /media
mdconfig -d -u 11
rm img.img

---------- end script ----------

>Fix:
No known fix.

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: kib 
State-Changed-When: Sun Jan 23 19:08:34 UTC 2011 
State-Changed-Why:  
Apparently, the suspension of the filesystem failed to finish, causing 
all writers on the filesystem to block. 

To diagnose the cause, we need the information specified at 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html 


Responsible-Changed-From-To: freebsd-amd64->freebsd-fs 
Responsible-Changed-By: kib 
Responsible-Changed-When: Sun Jan 23 19:08:34 UTC 2011 
Responsible-Changed-Why:  
UFS issue. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154228 

From: Carl <k0802647@telus.net>
To: bug-followup@FreeBSD.org, k0802647@telus.net
Cc:  
Subject: Re: kern/154228: [md] md getting stuck in wdrain state
Date: Sun, 23 Jan 2011 15:08:15 -0800

 Now I owe a friend a beer. His assertion was that in submitting this bug 
 report I would incur a request to use a debugger myself, and this 
 despite me being an end user reporting a problem on a production system 
 in a remote location which other people depend on.
 
 While it would be an interesting and educational distraction to rebuild 
 the kernel and deadlock a production system a few more times, I trust 
 it's understood why that can't happen. As such, I thought it would be 
 helpful to provide the above script so FreeBSD developers with more 
 systems at their disposal might try to reproduce the problem. Any chance 
 of that happening?
 
 Carl                                              / K0802647

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/154228: commit references a PR
Date: Wed, 26 Jan 2011 10:34:28 +0000 (UTC)

 Author: kib
 Date: Wed Jan 26 10:34:21 2011
 New Revision: 217880
 URL: http://svn.freebsd.org/changeset/base/217880
 
 Log:
   Treat async buffer writes from the gjournal switcher thread the same as
   from syncer. We shall not sleep on running buffer space when suspending.
   
   Reproduced and tested by:	pho
   PR:	kern/154228
   MFC after:	1 week
 
 Modified:
   head/sys/geom/journal/g_journal.c
 
 Modified: head/sys/geom/journal/g_journal.c
 ==============================================================================
 --- head/sys/geom/journal/g_journal.c	Wed Jan 26 10:08:37 2011	(r217879)
 +++ head/sys/geom/journal/g_journal.c	Wed Jan 26 10:34:21 2011	(r217880)
 @@ -3033,6 +3033,7 @@ g_journal_switcher(void *arg)
  	int error;
  
  	mp = arg;
 +	curthread->td_pflags |= TDP_NORUNNINGBUF;
  	for (;;) {
  		g_journal_switcher_wokenup = 0;
  		error = tsleep(&g_journal_switcher_state, PRIBIO, "jsw:wait",
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Carl <k0802647@telus.net>
To: bug-followup@FreeBSD.org, k0802647@telus.net
Cc:  
Subject: Re: kern/154228: [md] md getting stuck in wdrain state
Date: Tue, 08 Feb 2011 02:59:41 -0800

 For whatever reason I was not copied on the patch message, despite being 
 the bug reporter.
 
 The explanation for that patch is more than a little obscure. In simpler 
 terms, what have you uncovered?
 
 Does that patch implement a complete fix, partial fix, a workaround, or 
 what? Is it recommended I try it?
 
 Did someone manage to reproduce my problem scenario?
 
 Yesterday I ran into the same bug. Similar but different exercise. Again 
 on a remote production system. I had no choice but to try again, so I 
 repeated the procedure, only using a non-sparse file instead. It hung 
 yet again, so that should rule out sparse files as part of the problem.
 
 I noticed in the mdconfig(8) man page this description for the "-o 
 [no]async" option:
 
    'For vnode backed devices: avoid IO_SYNC for increased
     performance but at the risk of deadlocking the entire
     kernel.'
 
 It seems to me the default would be "-o noasync" and that this is 
 supposed to avoid that particular risk for deadlock, but what command 
 can I use to verify whether a particular enabled memory disk is actually 
 using IO_SYNC or not?
 
 Carl                                                   / K0802647
 

From: Carl <k0802647@telus.net>
To: bug-followup@FreeBSD.org, k0802647@telus.net
Cc:  
Subject: Re: kern/154228: [md] md getting stuck in wdrain state
Date: Fri, 11 Feb 2011 23:26:05 -0800

 For the sake of end users suffering from this problem, please elaborate 
 on the patch.
 
 Carl                                                 / K0802647

From: Carl <k0802647@telus.net>
To: bug-followup@FreeBSD.org, k0802647@telus.net
Cc:  
Subject: Re: kern/154228: [md] md getting stuck in wdrain state
Date: Sat, 26 Feb 2011 18:23:38 -0800

 I applied the patch to the FreeBSD-8.1-RELEASE-amd64 system for which 
 I'd filed the bug report. It solved the problem I reported for the 
 scenario in question. Thanks.
 
 Carl                                              / K0802647
State-Changed-From-To: feedback->patched 
State-Changed-By: jh 
State-Changed-When: Fri May 6 08:16:03 UTC 2011 
State-Changed-Why:  
Fixed in head (r217880) and stable/8 (r218188). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154228 
>Unformatted:
