From nobody@FreeBSD.org  Mon Jan 19 02:40:37 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 066161065673
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 19 Jan 2009 02:40:37 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id E93E98FC20
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 19 Jan 2009 02:40:36 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n0J2eagl040024
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 19 Jan 2009 02:40:36 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n0J2eaBO040023;
	Mon, 19 Jan 2009 02:40:36 GMT
	(envelope-from nobody)
Message-Id: <200901190240.n0J2eaBO040023@www.freebsd.org>
Date: Mon, 19 Jan 2009 02:40:36 GMT
From: Dylan Simon <dylan@dylex.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ata] DMA errors accessing multiple SATA channels
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         130726
>Category:       kern
>Synopsis:       [ata] DMA errors accessing multiple SATA channels
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jan 19 02:50:03 UTC 2009
>Closed-Date:    
>Last-Modified:  Sun Mar 14 20:00:18 UTC 2010
>Originator:     Dylan Simon
>Release:        8.0-CURRENT 20090114
>Organization:
NYU
>Environment:
FreeBSD lust.cns.nyu.edu 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Wed Jan 14 19:58:58 EST 2009     dylan@lust.cns.nyu.edu:/usr/obj/usr/src/sys/SIN  amd64
>Description:
kernel: ad8: FAILURE - load data
kernel: ad8: setting up DMA failed
kernel: g_vfs_done():ad8s1e[WRITE(offset=1881014272, length=131072)]error = 5
kernel: ad6: FAILURE - load data
kernel: ad6: setting up DMA failed
kernel: g_vfs_done():ad6s1e[READ(offset=4117364736, length=32768)]error = 5
kernel: vnode_pager_getpages: I/O read error

Disk errors of this form occur after a few minutes of disk load.  It seems to occur only when operations are attempting to write to SATA disks on different channels at roughly the same time.  Eventually results in panics or hangs.

Occurs with GENERIC 200812 snapshot kernel too.  Occurs regardless of legacy/enhanced BIOS setting.  Seen with gmirror, ufs, and zfs under both nfs and local access.  Does not occur at all when accessed disks are on the same channel (e.g., ad6+ad7 or ad8 alone in this case).  Does not occur on same hardware with linux under similar conditions.

Hardware: Supermicro C2SEA with 6 SATA ports on ICH10, four identical disks under various configurations.

Symptoms matching this have also been seen on an ICH7 with two disks on different channels on hardware that works fine with 7.1, triggered by daily periodic scripts.

atacontrol list:
ATA channel 2:
    Master: acd0 <ATAPI DVD D DH16D3P/1P52> ATA/ATAPI revision 7
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <ST31000333AS/CC1F> Serial ATA II
    Slave:   ad7 <ST31000333AS/CC1F> Serial ATA II
ATA channel 4:
    Master:  ad8 <ST31000333AS/CC1F> Serial ATA II
    Slave:   ad9 <ST31000333AS/CC1F> Serial ATA II
ATA channel 5:
    Master:      no device present
    Slave:       no device present
ATA channel 6:
    Master:      no device present
    Slave:       no device present

pciconf -lv (partial):
pcib3@pci0:0:30:0:      class=0x060401 card=0xb88015d9 chip=0x244e8086 rev=0x90 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801 Family (ICH2/3/4/4/5/5/6/7/8/9,63xxESB) Hub Interface to PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:      class=0x060100 card=0xb88015d9 chip=0x3a188086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-ISA
atapci1@pci0:0:31:2:    class=0x01018f card=0xb88015d9 chip=0x3a208086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = mass storage
    subclass   = ATA
atapci2@pci0:0:31:5:    class=0x010185 card=0xb88015d9 chip=0x3a268086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = mass storage
    subclass   = ATA

verbose dmesg (partial):
lust kernel: atapci0: <ITE IT8213F UDMA133 controller> port 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0xe400-0xe40f irq 22 at device 4.0 on pci3
lust kernel: atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xe400
lust kernel: ioapic0: routing intpin 22 (PCI IRQ 22) to vector 53
lust kernel: atapci0: [MPSAFE]
lust kernel: atapci0: [ITHREAD]
lust kernel: ata2: <ATA channel 0> on atapci0
lust kernel: atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0xec00
lust kernel: atapci0: Reserved 0x4 bytes for rid 0x14 type 4 at 0xe880
lust kernel: ata2: reset tp1 mask=03 ostat0=50 ostat1=00
lust kernel: ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
lust kernel: ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00
lust kernel: ata2: reset tp2 stat0=00 stat1=00 devices=0x10000
lust kernel: ata2: [MPSAFE]
lust kernel: ata2: [ITHREAD]
lust kernel: pci3: <serial bus, FireWire> at device 8.0 (no driver attached)
lust kernel: isab0: <PCI-ISA bridge> at device 31.0 on pci0
lust kernel: isa0: <ISA bus> on isab0
lust kernel: atapci1: <Intel ICH10 SATA300 controller> port 0xc400-0xc407,0xc080-0xc083,0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb88f,0xb800-0xb80f irq 19 at device 31.2 on pci0
lust kernel: atapci1: Reserved 0x10 bytes for rid 0x20 type 4 at 0xb880
lust kernel: atapci1: [MPSAFE]
lust kernel: atapci1: [ITHREAD]
lust kernel: atapci1: Reserved 0x10 bytes for rid 0x24 type 4 at 0xb800
lust kernel: ata3: <ATA channel 0> on atapci1
lust kernel: atapci1: Reserved 0x8 bytes for rid 0x10 type 4 at 0xc400
lust kernel: atapci1: Reserved 0x4 bytes for rid 0x14 type 4 at 0xc080
lust kernel: ata3: reset tp1 mask=03 ostat0=50 ostat1=50
lust kernel: ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
lust kernel: ata3: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
lust kernel: ata3: reset tp2 stat0=50 stat1=50 devices=0x3
lust kernel: ata3: [MPSAFE]
lust kernel: ata3: [ITHREAD]
lust kernel: ata4: <ATA channel 1> on atapci1
lust kernel: atapci1: Reserved 0x8 bytes for rid 0x18 type 4 at 0xc000
lust kernel: atapci1: Reserved 0x4 bytes for rid 0x1c type 4 at 0xbc00
lust kernel: ata4: reset tp1 mask=03 ostat0=50 ostat1=50
lust kernel: ata4: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
lust kernel: ata4: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
lust kernel: ata4: reset tp2 stat0=50 stat1=50 devices=0x3
lust kernel: ata4: [MPSAFE]
lust kernel: ata4: [ITHREAD]
lust kernel: pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
lust kernel: atapci2: <Intel ICH10 SATA300 controller> port 0xb400-0xb407,0xb080-0xb083,0xb000-0xb007,0xac00-0xac03,0xa880-0xa88f,0xa800-0xa80f irq 19 at device 31.5 on pci0
lust kernel: atapci2: Reserved 0x10 bytes for rid 0x20 type 4 at 0xa880
lust kernel: atapci2: [MPSAFE]
lust kernel: atapci2: [ITHREAD]
lust kernel: atapci2: Reserved 0x10 bytes for rid 0x24 type 4 at 0xa800
lust kernel: ata5: <ATA channel 0> on atapci2
lust kernel: atapci2: Reserved 0x8 bytes for rid 0x10 type 4 at 0xb400
lust kernel: atapci2: Reserved 0x4 bytes for rid 0x14 type 4 at 0xb080
lust kernel: ata5: reset tp1 mask=03 ostat0=7f ostat1=7f
lust kernel: ata5: stat0=0x7f err=0xff lsb=0xff msb=0xff
lust kernel: ata5: stat1=0x7f err=0xff lsb=0xff msb=0xff
lust kernel: ata5: reset tp2 stat0=ff stat1=ff devices=0x0
lust kernel: ata5: [MPSAFE]
lust kernel: ata5: [ITHREAD]
lust kernel: ata6: <ATA channel 1> on atapci2
lust kernel: atapci2: Reserved 0x8 bytes for rid 0x18 type 4 at 0xb000
lust kernel: atapci2: Reserved 0x4 bytes for rid 0x1c type 4 at 0xac00
lust kernel: ata6: reset tp1 mask=03 ostat0=7f ostat1=7f
lust kernel: ata6: stat0=0x7f err=0xff lsb=0xff msb=0xff
lust kernel: ata6: stat1=0x7f err=0xff lsb=0xff msb=0xff
lust kernel: ata6: reset tp2 stat0=ff stat1=ff devices=0x0
lust kernel: ata6: [MPSAFE]
lust kernel: ata6: [ITHREAD]
lust kernel: ata2: identify ch->devices=00010000
lust kernel: ata2-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire
lust kernel: acd0: setting PIO4 on IT8213F chip
lust kernel: acd0: setting UDMA33 on IT8213F chip
lust kernel: acd0: <ATAPI DVD D DH16D3P/1P52> DVDROM drive at ata2 as master
lust kernel: acd0: read 8268KB/s (8268KB/s), 198KB buffer, UDMA33
lust kernel: acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, DVDRAM, packet
lust kernel: acd0: Writes:
lust kernel: acd0: Audio: play, 256 volume levels
lust kernel: acd0: Mechanism: ejectable tray, unlocked
lust kernel: acd0: Medium: no/blank disc
lust kernel: ata3: identify ch->devices=00000003
lust kernel: ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
lust kernel: ata3-slave: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
lust kernel: ad6: 953869MB <Seagate ST31000333AS CC1F> at ata3-master SATA300
lust kernel: ad6: 1953525168 sectors [1938021C/16H/63S] 16 sectors/interrupt 1 depth queue
lust kernel: GEOM: new disk ad6
lust kernel: ad7: 953869MB <Seagate ST31000333AS CC1F> at ata3-slave SATA300
lust kernel: ad7: 1953525168 sectors [1938021C/16H/63S] 16 sectors/interrupt 1 depth queue
lust kernel: ata4: identify ch->devices=00000003
lust kernel: ata4-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
lust kernel: GEOM: new disk ad7
lust kernel: ata4-slave: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
lust kernel: ad8: 953869MB <Seagate ST31000333AS CC1F> at ata4-master SATA300
lust kernel: ad8: 1953525168 sectors [1938021C/16H/63S] 16 sectors/interrupt 1 depth queue
lust kernel: GEOM: new disk ad8
lust kernel: ad9: 953869MB <Seagate ST31000333AS CC1F> at ata4-slave SATA300
lust kernel: ad9: 1953525168 sectors [1938021C/16H/63S] 16 sectors/interrupt 1 depth queue
lust kernel: ata5: identify ch->devices=00000000
lust kernel: ata6: identify ch->devices=00000000
lust kernel: ioapic0: Assigning ISA IRQ 1 to local APIC 0
lust kernel: ioapic0: Assigning ISA IRQ 9 to local APIC 1
lust kernel: ioapic0: Assigning PCI IRQ 17 to local APIC 0
lust kernel: ioapic0: Assigning PCI IRQ 18 to local APIC 1
lust kernel: ioapic0: Assigning PCI IRQ 19 to local APIC 0
lust kernel: ioapic0: Assigning PCI IRQ 22 to local APIC 1
lust kernel: ioapic0: Assigning PCI IRQ 23 to local APIC 0
lust kernel: GEOM: new disk ad9
>How-To-Repeat:
Perform access involving writes to SATA disks on different channels of an ICH controller.
>Fix:


>Release-Note:
>Audit-Trail:

From: Rong-En Fan <rafan@infor.org>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/130726: DMA errors accessing multiple SATA channels
Date: Mon, 19 Jan 2009 11:48:53 +0800

 For the record, I believe I'm seeing the same thing with today's
 CURRENT. I can trigger the machine crash by copying data from ad2
 to ad0.
 
 For what it worth, back in 7.x, I can reliability trigger machine
 hang/crash if I access the 2nd disk extensively. It may or may not 
 related to this issue though.
 
 = atacontrol =
 
 $ atacontrol info ata0
 Master:  ad0 <WDC WD6400AAKS-22A7B0/01.03B01> Serial ATA II
 Slave:       no device present
 $ atacontrol info ata1
 Master:  ad2 <WDC WD5000AAKS-00YGA0/12.01C02> Serial ATA II
 Slave:       no device present
 
 = pciconf = 
 
 atapci0@pci0:0:31:2:    class=0x01018a card=0x101517aa chip=0x27c08086
 rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801GB/GR/GH (ICH7 Family) Serial ATA Storage
 Controller'
     class      = mass storage
     subclass   = ATA
 
 = dmesg =
 
 atapci0: <Intel ICH7 SATA300 controller> port
 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x18b0-0x18bf at device 31.2 on pci0
 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0x18b0
 ata0: <ATA channel 0> on atapci0
 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0
 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6
 ata0: reset tp1 mask=03 ostat0=50 ostat1=00
 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata0: reset tp2 stat0=50 stat1=00 devices=0x1
 ioapic0: routing intpin 14 (ISA IRQ 14) to vector 50
 ata0: [MPSAFE]
 ata0: [ITHREAD]
 ata1: <ATA channel 1> on atapci0
 atapci0: Reserved 0x8 bytes for rid 0x18 type 4 at 0x170
 atapci0: Reserved 0x1 bytes for rid 0x1c type 4 at 0x376
 ata1: reset tp1 mask=03 ostat0=50 ostat1=00
 ata1: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 ata1: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
 ata1: reset tp2 stat0=50 stat1=00 devices=0x1
 ioapic0: routing intpin 15 (ISA IRQ 15) to vector 51
 ata1: [MPSAFE]
 ata1: [ITHREAD]
 [...]
 ata0: identify ch->devices=00000001
 ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
 ad0: 610480MB <WDC WD6400AAKS-22A7B0 01.03B01> at ata0-master SATA150
 ad0: 1250263728 sectors [1240341C/16H/63S] 16 sectors/interrupt 1 depth queue
 ata1: identify ch->devices=00000001
 GEOM: new disk ad0
 ata1-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire
 ad2: 476940MB <WDC WD5000AAKS-00YGA0 12.01C02> at ata1-master SATA150
 ad2: 976773168 sectors [969021C/16H/63S] 16 sectors/interrupt 1 depth queue
 

From: Dylan Alex Simon <dylan@dylex.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/130726: DMA errors accessing multiple SATA channels
Date: Mon, 2 Feb 2009 14:23:20 -0500

 I managed to get a 7-STABLE kernel from 20080131 to boot on this machine.  All
 the ata-related boot messages I can see look identical, but it's been working
 fine.  The machine's been up running gmirror/ufs, zfs, and nfs for a couple
 days whereas 8-CURRENT wouldn't last more than a minute or two.

From: Alexander Motin <mav@FreeBSD.org>
To: bug-followup@FreeBSD.org, dylan@dylex.net
Cc:  
Subject: Re: kern/130726: [ata] DMA errors accessing multiple SATA channels
Date: Fri, 12 Mar 2010 07:03:08 +0200

 It is probably some kind of memory management problem, not a hardware
 issue. It would be nice if you tested it with newer system.
 
 -- 
 Alexander Motin

From: Dylan Alex Simon <dylan@dylex.net>
To: Alexander Motin <mav@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/130726: [ata] DMA errors accessing multiple SATA channels
Date: Sun, 14 Mar 2010 15:30:36 -0400

 A newer RELENG_8 build, or HEAD?  I'm pretty sure there's nothing in 8.0 now I
 haven't tried.  Unfortunately the only machine I have to test with is in
 production, and has been running 7.2 with no problems for over a year now, but
 I might be able to try something specific at some point.
>Unformatted:
