From nobody@FreeBSD.org  Thu Apr 15 07:23:29 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5EBD31065673
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Apr 2010 07:23:29 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 4B4E58FC19
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Apr 2010 07:23:29 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o3F7NTcS087095
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Apr 2010 07:23:29 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o3F7NT9b087094;
	Thu, 15 Apr 2010 07:23:29 GMT
	(envelope-from nobody)
Message-Id: <201004150723.o3F7NT9b087094@www.freebsd.org>
Date: Thu, 15 Apr 2010 07:23:29 GMT
From: Daniel Black <daniel.subs@internode.on.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [pmp][siis] removed SATA device on port multiplier resets entire channel losing all other devices (8.0-stable)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         145714
>Category:       kern
>Synopsis:       [siis] removed SATA device on port multiplier resets entire channel losing all other devices (8.0-stable)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr 15 07:30:01 UTC 2010
>Closed-Date:    
>Last-Modified:  Sat Jul  3 15:50:01 UTC 2010
>Originator:     Daniel Black
>Release:        8.0
>Organization:
OVEE
>Environment:
FreeBSD brm00.smartcars.in.nicta.com.au 8.0-STABLE FreeBSD 8.0-STABLE #0: Fri Apr 16 01:53:45 EST 2010     root@brm00.smartcars.in.nicta.com.au:/usr/obj/usr/src/sys/BRM  amd64

cvsup of stable as of a few hours ago
>Description:
A SATA harddrive was physically removed from one of the ports of a
Silicon Image 3726 port multiplier. The kernel log appears to be reseting
the entire port multiplier losing 4 other devices. Even after the reset
the other devices do not recover. 

# pciconf -lvc
atapci1@pci0:0:31:2:	class=0x01018a card=0xb0021458 chip=0x3a208086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'SATA2(4Port2) (ICH10 Family)'
    class      = mass storage
    subclass   = ATA
    cap 01[70] = powerspec 3  supports D0 D3  current D0
    cap 13[b0] = PCI Advanced Features: FLR TP
none1@pci0:0:31:3:	class=0x0c0500 card=0x50011458 chip=0x3a308086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'SMB controller  (50011458)'
    class      = serial bus
    subclass   = SMBus
atapci2@pci0:0:31:5:	class=0x010185 card=0xb0021458 chip=0x3a268086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'SATA2(2Port2) (ICH10 Family)'
    class      = mass storage
    subclass   = ATA
    cap 01[70] = powerspec 3  supports D0 D3  current D0
    cap 13[b0] = PCI Advanced Features: FLR TP
siis0@pci0:5:0:0:	class=0x010400 card=0x71321095 chip=0x31321095 rev=0x01 hdr=0x00
    vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
    device     = 'PCI Express (1x) to 2 Port SATA300 (SiI 3132)'
    class      = mass storage
    subclass   = RAID
    cap 01[54] = powerspec 2  supports D0 D1 D2 D3  current D0
    cap 05[5c] = MSI supports 1 message, 64 bit 
    cap 10[70] = PCI-Express 1 legacy endpoint max data 128(1024) link x1(x1)
siis1@pci0:6:0:0:	class=0x010400 card=0x71321095 chip=0x31321095 rev=0x01 hdr=0x00
    vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
    device     = 'PCI Express (1x) to 2 Port SATA300 (SiI 3132)'
    class      = mass storage
    subclass   = RAID
    cap 01[54] = powerspec 2  supports D0 D1 D2 D3  current D0
    cap 05[5c] = MSI supports 1 message, 64 bit 
    cap 10[70] = PCI-Express 1 legacy endpoint max data 128(1024) link x1(x1)
atapci0@pci0:7:0:0:	class=0x010185 card=0xb0001458 chip=0x2368197b rev=0x00 hdr=0x00
    vendor     = 'JMicron Technology Corp.'
    device     = 'JMB368 IDE Controller'
    class      = mass storage
    subclass   = ATA
    cap 01[68] = powerspec 2  supports D0 D3  current D0
    cap 10[50] = PCI-Express 1 legacy endpoint IRQ 2 max data 128(128) link x1(x1)

# camcontrol devlist
<ST32000542AS CC34>                at scbus0 target 0 lun 0 (pass0,ada0)
<ST32000542AS CC34>                at scbus0 target 1 lun 0 (pass1,ada1)
<ST32000542AS CC34>                at scbus0 target 2 lun 0 (pass2,ada2)
<ST32000542AS CC34>                at scbus0 target 3 lun 0 (pass3,ada3)
<ST32000542AS CC34>                at scbus0 target 4 lun 0 (pass4,ada4)
<Port Multiplier 37261095 1706>    at scbus0 target 15 lun 0 (pass5,pmp2)
<ST32000542AS CC34>                at scbus3 target 0 lun 0 (pass12,ada10)
<ST32000542AS CC34>                at scbus3 target 1 lun 0 (pass13,ada11)
<ST32000542AS CC34>                at scbus3 target 2 lun 0 (pass14,ada12)
<ST32000542AS CC34>                at scbus3 target 3 lun 0 (pass15,ada13)
<ST32000542AS CC34>                at scbus3 target 4 lun 0 (pass16,ada14)
<Port Multiplier 37261095 1706>    at scbus3 target 15 lun 0 (pass17,pmp1)


# vmstat -i
interrupt                          total       rate
irq1: atkbd0                           2          0
irq8: rtc                         649492        127
irq14: ata0                        62691         12
irq16: uhci0 siis0+               452808         89
irq17: siis1                     6932183       1365
irq18: uhci2 ehci0+                   18          0
cpu0: timer                      5072836        999
irq256: re0                        13586          2
cpu1: timer                      5071093        999
cpu2: timer                      5070729        999
cpu3: timer                      5070492        999
Total                           28395930       5595



dmesg:
Disk ada7 was removed. ada5,6,8

Apr 16 03:53:42 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada7 offset=262144 size=8192 error=6
Apr 16 03:53:42 brm00 kernel: (ada7:siisch2:0:2:0): lost device
Apr 16 03:53:42 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada7 offset=2000398319616 size=8192 error=6
Apr 16 03:53:42 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada7 offset=2000398581760 size=8192 error=6
Apr 16 03:53:52 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 03:53:52 brm00 kernel: siisch2: device ready timeout
Apr 16 03:53:52 brm00 kernel: siisch2: trying full port reset ...
Apr 16 03:53:52 brm00 kernel: (ada9:siisch2:0:
Apr 16 03:53:52 brm00 kernel: 4:0): lost device
Apr 16 03:53:52 brm00 kernel: 
Apr 16 03:53:52 brm00 kernel: (ada8:siisch2:0:3:0): lost device
Apr 16 03:53:52 brm00 kernel: (ada6:siisch2:0:1:0): lost device
Apr 16 03:53:52 brm00 kernel: (ada5:siisch2:0:0:0): lost device
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada9 offset=262144 size=8192 error=6
Apr 16 03:53:52 brm00 kernel: (ada7:siisch2:0:2:0): Synchronize cache failed
Apr 16 03:53:52 brm00 kernel: 
Apr 16 03:53:52 brm00 kernel: (ada7:siisch2:0:2:0): removing device entry
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada9 offset=2000398319616 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada9 offset=2000398581760 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada8 offset=262144 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada8 offset=2000398319616 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada8 offset=2000398581760 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada6 offset=262144 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada6 offset=2000398319616 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada6 offset=2000398581760 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada5 offset=262144 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada5 offset=2000398319616 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: vdev I/O failure, zpool=tank path=/dev/ada5 offset=2000398581760 size=8192 error=6
Apr 16 03:53:52 brm00 root: ZFS: zpool I/O failure, zpool=tank error=6
Apr 16 03:53:52 brm00 last message repeated 6 times
Apr 16 03:53:52 brm00 kernel: (pmp0:siisch2:0:15:0): lost device
Apr 16 03:53:52 brm00 root: ZFS: zpool I/O failure, zpool=tank error=6
Apr 16 03:53:53 brm00 root: ZFS: vdev failure, zpool=tank type=vdev.no_replicas
Apr 16 03:55:52 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 03:55:52 brm00 kernel: siisch2: device ready timeout
Apr 16 03:55:52 brm00 kernel: siisch2: trying full port reset ...
Apr 16 03:57:30 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 03:57:30 brm00 kernel: siisch2: device ready timeout
Apr 16 03:57:30 brm00 kernel: siisch2: trying full port reset ...
Apr 16 03:58:05 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 03:58:05 brm00 kernel: siisch2: device ready timeout
Apr 16 03:58:05 brm00 kernel: siisch2: trying full port reset ...
Apr 16 03:59:11 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 03:59:11 brm00 kernel: siisch2: device ready timeout
Apr 16 03:59:11 brm00 kernel: siisch2: trying full port reset ...
Apr 16 04:05:11 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 04:05:11 brm00 kernel: siisch2: device ready timeout
Apr 16 04:05:11 brm00 kernel: siisch2: trying full port reset ...
Apr 16 04:07:40 brm00 kernel: siisch2: port is not ready (timeout 10000ms) status = 001f2000
Apr 16 04:07:40 brm00 kernel: siisch2: device ready timeout
Apr 16 04:07:40 brm00 kernel: siisch2: trying full port reset ...


# zpool status -v
(froze - truss revealed no system calls)
>How-To-Repeat:
install 5 disks in a port multiplier.
put them in use (e.g. raidz2 configuration)
remove a disk
>Fix:


>Release-Note:
>Audit-Trail:

From: Alexander Motin <mav@FreeBSD.org>
To: bug-followup@FreeBSD.org, daniel.subs@internode.on.net
Cc:  
Subject: Re: kern/145714: [siis] removed SATA device on port multiplier resets
 entire channel losing all other devices (8.0-stable)
Date: Sat, 03 Jul 2010 18:43:50 +0300

 Looks like either controller or port multiplier stuck so hard that even
 hard reset can get them out of it. It would be nice to replug/powercycle
 things one by one to understand where is the problem.
 
 -- 
 Alexander Motin
>Unformatted:
