From nobody@FreeBSD.org  Wed Sep  8 16:07:46 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 22D2910656BE
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  8 Sep 2010 16:07:46 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 125448FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  8 Sep 2010 16:07:46 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o88G7iW4062318
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 8 Sep 2010 16:07:44 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o88G7i7U062316;
	Wed, 8 Sep 2010 16:07:44 GMT
	(envelope-from nobody)
Message-Id: <201009081607.o88G7i7U062316@www.freebsd.org>
Date: Wed, 8 Sep 2010 16:07:44 GMT
From: Rich Ercolani <rercola@acm.jhu.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: zfs deadlock when arcmsr reports drive faulted
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         150390
>Category:       kern
>Synopsis:       [zfs] zfs deadlock when arcmsr reports drive faulted
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 08 16:10:02 UTC 2010
>Closed-Date:    
>Last-Modified:  Sun Sep 12 03:43:21 UTC 2010
>Originator:     Rich Ercolani
>Release:        8.1
>Organization:
JHU ACM
>Environment:
FreeBSD manticore.acm.jhu.edu 8.1-STABLE FreeBSD 8.1-STABLE #4 r211397M: Mon Aug 16 18:47:31 EDT 2010     root@manticore.acm.jhu.edu:/usr/obj/usr/local/ncvs/src/sys/DTRACE  amd64

>Description:
System deadlocks 100% reliably when a disk is reported FAULTED in the arcmsr card.

dmesg looks like:
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 

zpool and zfs-related commands, and all IO to the affected pool, hang forever in state D.

procstat reports:
[root@manticore ~]# ps aux | grep zpool
stump  3287  0.0  0.0 15700  1540   0  D+   12:03PM   0:00.00 zpool status
root   3286  0.0  0.0 15700  1528   1  T+   12:03PM   0:00.00 zpool status
root   3316  0.0  0.0  9120  1164   3  S+   12:07PM   0:00.00 grep zpool
[root@manticore ~]# procstat -k 3286
  PID    TID COMM             TDNAME           KSTACK                       
 3286 100484 zpool            -                mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall 
[root@manticore ~]# procstat -k 3287
  PID    TID COMM             TDNAME           KSTACK                       
 3287 100532 zpool            -                mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall 

>How-To-Repeat:
1) Have a disk fault on an arcmsr card.
2) Hang!
>Fix:


>Release-Note:
>Audit-Trail:

From: Rich <rercola@acm.jhu.edu>
To: bug-followup@freebsd.org
Cc:  
Subject: Re: misc/150390: zfs deadlock when arcmsr reports drive faulted
Date: Wed, 8 Sep 2010 13:00:26 -0400

 A neat update:
 It's demonstrably the case that this only occurs when a disk is marked
 FAULTED - if you physically remove a disk while the system is booted,
 the disk is correctly removed from the list of disks in areca-cli, and
 ZFS reports write errors but behaves correctly.
 
 - Rich
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: brucec 
Responsible-Changed-When: Sat Sep 11 14:57:59 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=150390 
>Unformatted:
