From nobody@FreeBSD.org  Mon Aug 27 17:52:12 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CE79F16A475
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 27 Aug 2007 17:52:12 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id A551113C442
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 27 Aug 2007 17:52:12 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.1/8.14.1) with ESMTP id l7RHqCfQ072413
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 27 Aug 2007 17:52:12 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.1/8.14.1/Submit) id l7RHqCix072412;
	Mon, 27 Aug 2007 17:52:12 GMT
	(envelope-from nobody)
Message-Id: <200708271752.l7RHqCix072412@www.freebsd.org>
Date: Mon, 27 Aug 2007 17:52:12 GMT
From: Hugo <hugo@barafranca.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: msk driver always fails under moderate network load. 
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         115882
>Category:       kern
>Synopsis:       [msk] msk driver always fails under moderate network load.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    yongari
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Aug 27 18:00:03 GMT 2007
>Closed-Date:    Tue Mar 18 01:48:03 UTC 2008
>Last-Modified:  Fri May 23 06:30:00 UTC 2008
>Originator:     Hugo
>Release:        7.0-CURRENT
>Organization:
>Environment:
FreeBSD nexus.bsdlan.org 7.0-CURRENT FreeBSD 7.0-CURRENT #0: Sun Aug 26 15:56:22 WEST 2007     klr@nexus.bsdlan.org:/usr/obj/usr/src/sys/NEXUS  i386

>Description:
pciconf -lv:
mskc0@pci2:0:0: class=0x020000 card=0x81421043 chip=0x436211ab rev=0x19 hdr=0x00
    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
    device     = 'Yukon 88E8053 PCI-E Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet

/boot/loader.conf:
hw.pci.enable_msix=0
hw.pci.enable_msi=0


/var/log/messages:
Aug 27 18:37:48 nexus kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering
Aug 27 18:38:08 nexus last message repeated 2 times
Aug 27 18:38:41 nexus kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering
Aug 27 18:39:10 nexus kernel: msk0: watchdog timeout
Aug 27 18:39:10 nexus kernel: msk0: link state changed to DOWN
Aug 27 18:39:12 nexus kernel: msk0: link state changed to UP
Aug 27 18:39:28 nexus kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering
Aug 27 18:40:08 nexus kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering
Aug 27 18:40:49 nexus last message repeated 3 times
Aug 27 18:40:54 nexus kernel: msk0: watchdog timeout
Aug 27 18:40:54 nexus kernel: msk0: link state changed to DOWN
Aug 27 18:40:56 nexus kernel: msk0: link state changed to UP
Aug 27 18:41:31 nexus kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering



During normal usage (browsing, email, instant messaging) the NIC will work fine. However, very rarely during online gaming and *always* during torrent downloads, the interface will go down with the above messages. It is impossible to bring it back without a reboot. 

The settings described above in loader.conf seem to delay the start of the symptoms, but the problem will always manifest itself.

This did *not* happen with 6.1-RELEASE and the msk driver on Marvell's website.

>How-To-Repeat:
Launch ktorrent and let it download for some time (usually less than 30 minutes, less than 10 if hw.pci.enable_msix and hw.pci.enable_msi are still enabled)
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->yongari 
Responsible-Changed-By: yongari 
Responsible-Changed-When: Thu Aug 30 04:24:27 UTC 2007 
Responsible-Changed-Why:  
Grab. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=115882 
State-Changed-From-To: open->feedback 
State-Changed-By: yongari 
State-Changed-When: Thu Aug 30 04:27:07 UTC 2007 
State-Changed-Why:  
Would you show me the following information to investigate the issue? 
- verbosed boot messages related with msk(4) 
- vmstat -i 
- ifconfig msk0 


http://www.freebsd.org/cgi/query-pr.cgi?pr=115882 
State-Changed-From-To: feedback->closed 
State-Changed-By: rwatson 
State-Changed-When: Sun Jan 27 13:02:40 UTC 2008 
State-Changed-Why:  
Close due to feedback timeout (>2 months).  If you have further information 
to help debug this problem, please follow up on the PR by e-mail and we can 
re-open it.  Thanks for the report. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=115882 

From: David Schultz <das@FreeBSD.ORG>
To: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Mon, 18 Feb 2008 19:21:05 -0500

 FWIW, I'm seeing this issue, too. It generally happens under heavy
 network load (for a single-user machine), e.g., transferring files
 over the LAN via scp or downloading an ISO image from a website.
 I haven't noticed it with NFS, at least not yet. No amount of fiddling
 in ifconfig fixes things; I'm forced to reboot.
 
 Some info:
 
 Boot messages:
 
 found-> vendor=0x11ab, dev=0x4362, revid=0x22
         domain=0, bus=4, slot=0, func=0
         class=02-00-00, hdrtype=0x00, mfdev=0
         cmdreg=0x0007, statreg=0x0010, cachelnsz=1 (dwords)
         lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
         intpin=a, irq=5
         powerspec 2  supports D0 D1 D2 D3  current D0
         MSI supports 2 messages, 64 bit
         map[10]: type Memory, range 64, base 0xfddfc000, size 14, enabled
 pcib4: requested memory range 0xfddfc000-0xfddfffff: good
         map[18]: type I/O Port, range 32, base 0xbe00, size  8, enabled
 pcib4: requested I/O range 0xbe00-0xbeff: in range
 pcib4: matched entry for 4.0.INTA
 pcib4: slot 0 INTA hardwired to IRQ 17
 mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xbe00-0xbeff mem 0xfddfc000-0xfddfffff irq 17 at device 0.0 on pci4
 mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xfddfc000
 mskc0: MSI count : 2
 mskc0: attempting to allocate 2 MSI vectors (2 supported)
 msi: routing MSI IRQ 256 to vector 54
 msi: routing MSI IRQ 257 to vector 55
 mskc0: using IRQs 256-257 for MSI
 mskc0: RAM buffer size : 48KB
 mskc0: Port 0 : Rx Queue 32KB(0x00000000:0x00007fff)
 mskc0: Port 0 : Tx Queue 16KB(0x00008000:0x0000bfff)
 msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
 msk0: bpf attached
 msk0: Ethernet address: 00:01:29:a3:3c:a3
 miibus0: <MII bus> on msk0
 e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
 e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
 mskc0: [MPSAFE]
 mskc0: [FILTER]
 [...]
 ioapic0: Assigning ISA IRQ 1 to local APIC 0
 ioapic0: Assigning ISA IRQ 4 to local APIC 1
 ioapic0: Assigning ISA IRQ 9 to local APIC 0
 ioapic0: Assigning ISA IRQ 12 to local APIC 1
 ioapic0: Assigning PCI IRQ 16 to local APIC 0
 ioapic0: Assigning PCI IRQ 18 to local APIC 1
 ioapic0: Assigning PCI IRQ 19 to local APIC 0
 ioapic0: Assigning PCI IRQ 21 to local APIC 1
 ioapic0: Assigning PCI IRQ 22 to local APIC 0
 ioapic0: Assigning PCI IRQ 23 to local APIC 1
 msi: Assigning MSI IRQ 256 to local APIC 0
 
 
 pciconf:
 mskc0@pci0:4:0:0:       class=0x020000 card=0x110215bd chip=0x436211ab rev=0x22 hdr=0x00
     vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
     device     = '88E8053 Marvell Yukon 88E8053 PCI-E Gigabit Ethernet Controller'
     class      = network
     subclass   = ethernet
 
 
 vmstat -i:
 interrupt                          total       rate
 irq1: atkbd0                         414          0
 irq12: psm0                       705928          7
 irq16: uhci0+                     595697          5
 irq18: ehci0 uhci5                     1          0
 irq19: uhci2 uhci*               7274884         72
 irq21: uhci1                       91394          0
 irq22: pcm0                      1300595         13
 irq23: uhci3 ehci1                   287          0
 cpu0: timer                    199843890       1998
 irq256: mskc0                    1707899         17
 cpu1: timer                    199262662       1993
 Total                          410783651       4108
 
 (The interrupt count stops going up when the card stops working.)
 
 
 ifconfig msk0:
 msk0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         options=19a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
         ether [...] inet [...]
         media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>)
         status: active

From: Pyun YongHyeon <pyunyh@gmail.com>
To: David Schultz <das@FreeBSD.ORG>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Tue, 19 Feb 2008 09:55:07 +0900

 On Mon, Feb 18, 2008 at 07:21:05PM -0500, David Schultz wrote:
  > FWIW, I'm seeing this issue, too. It generally happens under heavy
  > network load (for a single-user machine), e.g., transferring files
  > over the LAN via scp or downloading an ISO image from a website.
 
 I think scping or downloading ISO images are not heavy network
 loads.
 
  > I haven't noticed it with NFS, at least not yet. No amount of fiddling
  > in ifconfig fixes things; I'm forced to reboot.
  > 
  > Some info:
 
 Missing FreeBSD version?
 
  > 
  > Boot messages:
  > 
  > found-> vendor=0x11ab, dev=0x4362, revid=0x22
  >         domain=0, bus=4, slot=0, func=0
  >         class=02-00-00, hdrtype=0x00, mfdev=0
  >         cmdreg=0x0007, statreg=0x0010, cachelnsz=1 (dwords)
  >         lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
  >         intpin=a, irq=5
  >         powerspec 2  supports D0 D1 D2 D3  current D0
  >         MSI supports 2 messages, 64 bit
  >         map[10]: type Memory, range 64, base 0xfddfc000, size 14, enabled
  > pcib4: requested memory range 0xfddfc000-0xfddfffff: good
  >         map[18]: type I/O Port, range 32, base 0xbe00, size  8, enabled
  > pcib4: requested I/O range 0xbe00-0xbeff: in range
  > pcib4: matched entry for 4.0.INTA
  > pcib4: slot 0 INTA hardwired to IRQ 17
  > mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xbe00-0xbeff mem 0xfddfc000-0xfddfffff irq 17 at device 0.0 on pci4
  > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xfddfc000
  > mskc0: MSI count : 2
  > mskc0: attempting to allocate 2 MSI vectors (2 supported)
  > msi: routing MSI IRQ 256 to vector 54
  > msi: routing MSI IRQ 257 to vector 55
  > mskc0: using IRQs 256-257 for MSI
  > mskc0: RAM buffer size : 48KB
  > mskc0: Port 0 : Rx Queue 32KB(0x00000000:0x00007fff)
  > mskc0: Port 0 : Tx Queue 16KB(0x00008000:0x0000bfff)
  > msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
  > msk0: bpf attached
  > msk0: Ethernet address: 00:01:29:a3:3c:a3
  > miibus0: <MII bus> on msk0
  > e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
  > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
  > mskc0: [MPSAFE]
  > mskc0: [FILTER]
  > [...]
  > ioapic0: Assigning ISA IRQ 1 to local APIC 0
  > ioapic0: Assigning ISA IRQ 4 to local APIC 1
  > ioapic0: Assigning ISA IRQ 9 to local APIC 0
  > ioapic0: Assigning ISA IRQ 12 to local APIC 1
  > ioapic0: Assigning PCI IRQ 16 to local APIC 0
  > ioapic0: Assigning PCI IRQ 18 to local APIC 1
  > ioapic0: Assigning PCI IRQ 19 to local APIC 0
  > ioapic0: Assigning PCI IRQ 21 to local APIC 1
  > ioapic0: Assigning PCI IRQ 22 to local APIC 0
  > ioapic0: Assigning PCI IRQ 23 to local APIC 1
  > msi: Assigning MSI IRQ 256 to local APIC 0
  > 
  > 
  > pciconf:
  > mskc0@pci0:4:0:0:       class=0x020000 card=0x110215bd chip=0x436211ab rev=0x22 hdr=0x00
  >     vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
  >     device     = '88E8053 Marvell Yukon 88E8053 PCI-E Gigabit Ethernet Controller'
  >     class      = network
  >     subclass   = ethernet
  > 
  > 
  > vmstat -i:
  > interrupt                          total       rate
  > irq1: atkbd0                         414          0
  > irq12: psm0                       705928          7
  > irq16: uhci0+                     595697          5
  > irq18: ehci0 uhci5                     1          0
  > irq19: uhci2 uhci*               7274884         72
  > irq21: uhci1                       91394          0
  > irq22: pcm0                      1300595         13
  > irq23: uhci3 ehci1                   287          0
  > cpu0: timer                    199843890       1998
  > irq256: mskc0                    1707899         17
  > cpu1: timer                    199262662       1993
  > Total                          410783651       4108
  > 
  > (The interrupt count stops going up when the card stops working.)
  > 
  > 
  > ifconfig msk0:
  > msk0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
  >         options=19a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
  >         ether [...] inet [...]
  >         media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>)
  >         status: active
 
 Does the link parner also agree on 100baseTx and full-duplex media
 configuration? If link partner maintains counters for number of
 transmitted pause frames to your box would you let me know?
 
 -- 
 Regards,
 Pyun YongHyeon

From: David Schultz <das@FreeBSD.ORG>
To: Pyun YongHyeon <pyunyh@gmail.com>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Mon, 18 Feb 2008 23:36:27 -0500

 On Tue, Feb 19, 2008, Pyun YongHyeon wrote:
 > On Mon, Feb 18, 2008 at 07:21:05PM -0500, David Schultz wrote:
 >  > FWIW, I'm seeing this issue, too. It generally happens under heavy
 >  > network load (for a single-user machine), e.g., transferring files
 >  > over the LAN via scp or downloading an ISO image from a website.
 > 
 > I think scping or downloading ISO images are not heavy network
 > loads.
 
 Okay, then call them moderate loads if you will, but the point is
 that the card is still having problems!
 
 > 
 >  > I haven't noticed it with NFS, at least not yet. No amount of fiddling
 >  > in ifconfig fixes things; I'm forced to reboot.
 >  > 
 >  > Some info:
 > 
 > Missing FreeBSD version?
 
 Aah, sorry. It's an amd64 -CURRENT with sources from 2/6.
 
 >  > ifconfig msk0:
 >  > msk0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >  >         options=19a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
 >  >         ether [...] inet [...]
 >  >         media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>)
 >  >         status: active
 > 
 > Does the link parner also agree on 100baseTx and full-duplex media
 > configuration? If link partner maintains counters for number of
 > transmitted pause frames to your box would you let me know?
 
 Currently it's connected to a switch that I do not have access to,
 but previously I had it connected to another box with a card
 supported by the em(4) driver. At that time, both ends agreed to
 1000baseTX and the msk card still wedged after a few minutes of
 heavy traffic.
 
 As for pause frames, I'm not sure. I can reconnect it to that box
 and try to reproduce the problem, but I don't know how offhand to
 get the em(4) driver to give me info on link layer control
 packets.
State-Changed-From-To: closed->open 
State-Changed-By: linimon 
State-Changed-When: Tue Feb 19 15:42:42 UTC 2008 
State-Changed-Why:  
Re-open with new data.  The following 2 email messages got caught in 
the spamtrap due to a spamassassin outage. 

Date: Tue, 19 Feb 2008 14:11:49 +0900 
From: Pyun YongHyeon <pyunyh@gmail.com> 

On Mon, Feb 18, 2008 at 11:36:27PM -0500, David Schultz wrote: 
> On Tue, Feb 19, 2008, Pyun YongHyeon wrote: 
> > On Mon, Feb 18, 2008 at 07:21:05PM -0500, David Schultz wrote: 
> >  > FWIW, I'm seeing this issue, too. It generally happens under heavy 
> >  > network load (for a single-user machine), e.g., transferring files 
> >  > over the LAN via scp or downloading an ISO image from a website. 
> >  
> > I think scping or downloading ISO images are not heavy network 
> > loads. 
>  
> Okay, then call them moderate loads if you will, but the point is 
> that the card is still having problems! 
>  

I see. :-) 

> >  
> >  > I haven't noticed it with NFS, at least not yet. No amount of fiddling 
> >  > in ifconfig fixes things; I'm forced to reboot. 
> >  >  
> >  > Some info: 
> >  
> > Missing FreeBSD version? 
>  
> Aah, sorry. It's an amd64 -CURRENT with sources from 2/6. 
>  

Ok. 

> >  > ifconfig msk0: 
> >  > msk0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 
> >  >         options=19a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4> 
> >  >         ether [...] inet [...] 
> >  >         media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>) 
> >  >         status: active 
> >  
> > Does the link parner also agree on 100baseTx and full-duplex media 
> > configuration? If link partner maintains counters for number of 
> > transmitted pause frames to your box would you let me know? 
>  
> Currently it's connected to a switch that I do not have access to, 
> but previously I had it connected to another box with a card 
> supported by the em(4) driver. At that time, both ends agreed to 
> 1000baseTX and the msk card still wedged after a few minutes of 
> heavy traffic. 
>  

It seems that you can reliably reproduce the issue. Would you let 
me know what commands were used to wedge msk(4)? 
I can't reproduce it with scping/downloading files or netperf tests. 

> As for pause frames, I'm not sure. I can reconnect it to that box 
> and try to reproduce the problem, but I don't know how offhand to 
> get the em(4) driver to give me info on link layer control 
> packets. 

Does interface down and re-up make it work again? 

--  
Regards, 
Pyun YongHyeon 

Date: Tue, 19 Feb 2008 01:01:54 -0500 
From: David Schultz <das@FreeBSD.ORG> 

On Tue, Feb 19, 2008, Pyun YongHyeon wrote: 
>  > Currently it's connected to a switch that I do not have access to, 
>  > but previously I had it connected to another box with a card 
>  > supported by the em(4) driver. At that time, both ends agreed to 
>  > 1000baseTX and the msk card still wedged after a few minutes of 
>  > heavy traffic. 
>  >  
>  
> It seems that you can reliably reproduce the issue. Would you let 
> me know what commands were used to wedge msk(4)? 
> I can't reproduce it with scping/downloading files or netperf tests. 

Generally anything that tries to transfer a lot of data from a 
remote host via TCP seems to do it. It does take some time to 
reproduce, so it's not like I can just type a single command to do 
it. Most recently it was downloading a 67 MB tarball from a fast 
HTTP server with firefox, before that it was scping a large file, 
and before that it was another HTTP connection. 

>  > As for pause frames, I'm not sure. I can reconnect it to that box 
>  > and try to reproduce the problem, but I don't know how offhand to 
>  > get the em(4) driver to give me info on link layer control 
>  > packets. 
>  
> Does interface down and re-up make it work again? 

No, I tried that many times. I also fiddled with all of the 
options in ifconfig that I could find, but in the end I just had 
to reboot the machine. 

Is there any other diagnostic info that would help? As I said, 
once it wedges, the total interrupt count in vmstat -i stops 
increasing. Maybe the watchdog timeout code isn't reenabling 
interrupts on the card properly or something... 


http://www.freebsd.org/cgi/query-pr.cgi?pr=115882 

From: Pyun YongHyeon <pyunyh@gmail.com>
To: David Schultz <das@FreeBSD.ORG>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Thu, 21 Feb 2008 12:44:38 +0900

 --6TrnltStXW4iwmi0
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, Feb 19, 2008 at 01:01:54AM -0500, David Schultz wrote:
  > On Tue, Feb 19, 2008, Pyun YongHyeon wrote:
  > >  > Currently it's connected to a switch that I do not have access to,
  > >  > but previously I had it connected to another box with a card
  > >  > supported by the em(4) driver. At that time, both ends agreed to
  > >  > 1000baseTX and the msk card still wedged after a few minutes of
  > >  > heavy traffic.
  > >  > 
  > > 
  > > It seems that you can reliably reproduce the issue. Would you let
  > > me know what commands were used to wedge msk(4)?
  > > I can't reproduce it with scping/downloading files or netperf tests.
  > 
  > Generally anything that tries to transfer a lot of data from a
  > remote host via TCP seems to do it. It does take some time to
  > reproduce, so it's not like I can just type a single command to do
  > it. Most recently it was downloading a 67 MB tarball from a fast
  > HTTP server with firefox, before that it was scping a large file,
  > and before that it was another HTTP connection.
  > 
  > >  > As for pause frames, I'm not sure. I can reconnect it to that box
  > >  > and try to reproduce the problem, but I don't know how offhand to
  > >  > get the em(4) driver to give me info on link layer control
  > >  > packets.
  > > 
  > > Does interface down and re-up make it work again?
  > 
  > No, I tried that many times. I also fiddled with all of the
  > options in ifconfig that I could find, but in the end I just had
  > to reboot the machine.
  > 
 
 Hmm, that looks like a hardware hang and PCI hardware reset recovered
 the controller. If this is right the issue you're seeing is not
 related with this PR.
 
  > Is there any other diagnostic info that would help? As I said,
  > once it wedges, the total interrupt count in vmstat -i stops
  > increasing. Maybe the watchdog timeout code isn't reenabling
  > interrupts on the card properly or something...
 
 I can't still reproduce it but would you try attached patch?
 
 -- 
 Regards,
 Pyun YongHyeon
 
 --6TrnltStXW4iwmi0
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="msk.pause.patch"
 
 --- sys/dev/msk/if_msk.c.orig	2008-02-04 09:59:06.000000000 +0900
 +++ sys/dev/msk/if_msk.c	2008-02-21 12:35:51.000000000 +0900
 @@ -3658,9 +3658,13 @@
  	CSR_WRITE_4(sc, MR_ADDR(sc_if->msk_port, RX_GMF_FL_MSK),
  	    GMR_FS_ANY_ERR);
  
 -	/* Set Rx FIFO flush threshold to 64 bytes. */
 +	/*
 +	 * Set Rx FIFO flush threshold to 64 bytes.
 +	 * Increase the threshold by single unit to work-aorund
 +	 * hardware hang on pause frames.
 +	 */
  	CSR_WRITE_4(sc, MR_ADDR(sc_if->msk_port, RX_GMF_FL_THR),
 -	    RX_GMF_FL_THR_DEF);
 +	    RX_GMF_FL_THR_DEF + 1);
  
  	/* Configure Tx MAC FIFO. */
  	CSR_WRITE_4(sc, MR_ADDR(sc_if->msk_port, TX_GMF_CTRL_T), GMF_RST_SET);
 --- sys/dev/msk/if_mskreg.h.orig	2007-12-05 18:41:58.000000000 +0900
 +++ sys/dev/msk/if_mskreg.h	2008-02-21 12:31:00.000000000 +0900
 @@ -1818,6 +1818,7 @@
  			GMR_FS_LONG_ERR | \
  			GMR_FS_MII_ERR | \
  			GMR_FS_BAD_FC | \
 +			GMR_FS_GOOD_FC | \
  			GMR_FS_UN_SIZE | \
  			GMR_FS_JABBER)
  
 
 --6TrnltStXW4iwmi0--

From: David Schultz <das@FreeBSD.ORG>
To: Pyun YongHyeon <pyunyh@gmail.com>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Sun, 24 Feb 2008 12:57:05 -0500

 On Thu, Feb 21, 2008, Pyun YongHyeon wrote:
 >  > Is there any other diagnostic info that would help? As I said,
 >  > once it wedges, the total interrupt count in vmstat -i stops
 >  > increasing. Maybe the watchdog timeout code isn't reenabling
 >  > interrupts on the card properly or something...
 > 
 > I can't still reproduce it but would you try attached patch?
 
 I've been running with the patch for 2 days now, and it hasn't
 hanged yet, so it seems like this fixed the problem! I'll keep
 exercising it for a few more days and let you know if there are
 any further problems.

From: Pyun YongHyeon <pyunyh@gmail.com>
To: David Schultz <das@FreeBSD.ORG>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Mon, 25 Feb 2008 12:40:43 +0900

 On Sun, Feb 24, 2008 at 12:57:05PM -0500, David Schultz wrote:
  > On Thu, Feb 21, 2008, Pyun YongHyeon wrote:
  > >  > Is there any other diagnostic info that would help? As I said,
  > >  > once it wedges, the total interrupt count in vmstat -i stops
  > >  > increasing. Maybe the watchdog timeout code isn't reenabling
  > >  > interrupts on the card properly or something...
  > > 
  > > I can't still reproduce it but would you try attached patch?
  > 
  > I've been running with the patch for 2 days now, and it hasn't
  > hanged yet, so it seems like this fixed the problem! I'll keep
  > exercising it for a few more days and let you know if there are
  > any further problems.
 
 Thanks for testing. I'll wait one more week and commit the patch.
 If you encounter the issue again please let me know asap.
 
 Thanks.
 -- 
 Regards,
 Pyun YongHyeon

From: David Schultz <das@FreeBSD.ORG>
To: Pyun YongHyeon <pyunyh@gmail.com>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Thu, 28 Feb 2008 18:32:39 -0500

 On Mon, Feb 25, 2008, Pyun YongHyeon wrote:
 > On Sun, Feb 24, 2008 at 12:57:05PM -0500, David Schultz wrote:
 >  > On Thu, Feb 21, 2008, Pyun YongHyeon wrote:
 >  > >  > Is there any other diagnostic info that would help? As I said,
 >  > >  > once it wedges, the total interrupt count in vmstat -i stops
 >  > >  > increasing. Maybe the watchdog timeout code isn't reenabling
 >  > >  > interrupts on the card properly or something...
 >  > > 
 >  > > I can't still reproduce it but would you try attached patch?
 >  > 
 >  > I've been running with the patch for 2 days now, and it hasn't
 >  > hanged yet, so it seems like this fixed the problem! I'll keep
 >  > exercising it for a few more days and let you know if there are
 >  > any further problems.
 > 
 > Thanks for testing. I'll wait one more week and commit the patch.
 > If you encounter the issue again please let me know asap.
 
 Sigh, it happened again. This time the interface was mostly idle, too:
 
 Feb 28 12:10:27 zim kernel: msk0: watchdog timeout
 Feb 28 12:10:27 zim kernel: msk0: link state changed to DOWN
 Feb 28 12:10:29 zim kernel: msk0: link state changed to UP
 Feb 28 12:10:41 zim kernel: msk0: watchdog timeout (missed Tx interrupts) -- rec
 overing
 Feb 28 12:11:22 zim last message repeated 4 times
 Feb 28 12:12:51 zim last message repeated 6 times
 Feb 28 12:13:14 zim kernel: msk0: watchdog timeout
 Feb 28 12:13:14 zim kernel: msk0: link state changed to DOWN
 Feb 28 12:13:17 zim kernel: msk0: link state changed to UP
 Feb 28 12:13:25 zim kernel: msk0: watchdog timeout (missed Tx interrupts) -- rec
 overing
 Feb 28 12:13:54 zim last message repeated 3 times
 Feb 28 12:15:09 zim last message repeated 5 times
 
 This time I did not compile the msk driver into the kernel.
 Unloading and then reloading the module fixed the problem without
 rebooting. (Kernel message log below in case it is somehow
 useful.)
 
 Feb 28 18:26:55 zim kernel: e1000phy0: detached
 Feb 28 18:26:55 zim kernel: miibus0: detached
 Feb 28 18:26:55 zim kernel: msk0: detached
 Feb 28 18:26:55 zim kernel: mskc0: detached
 Feb 28 18:27:07 zim kernel: pci0: driver added
 Feb 28 18:27:07 zim kernel: found->     vendor=0x8086, dev=0x2930, revid=0x02
 Feb 28 18:27:07 zim kernel: domain=0, bus=0, slot=31, func=3
 Feb 28 18:27:07 zim kernel: class=0c-05-00, hdrtype=0x00, mfdev=0
 Feb 28 18:27:07 zim kernel: cmdreg=0x0003, statreg=0x0280, cachelnsz=0 (dwords)
 Feb 28 18:27:07 zim kernel: lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
 Feb 28 18:27:07 zim kernel: intpin=b, irq=18
 Feb 28 18:27:07 zim kernel: pci0:0:31:3: reprobing on driver added
 Feb 28 18:27:07 zim kernel: pci1: driver added
 Feb 28 18:27:07 zim kernel: pci2: driver added
 Feb 28 18:27:07 zim kernel: pci3: driver added
 Feb 28 18:27:07 zim kernel: pci4: driver added
 Feb 28 18:27:07 zim kernel: found->     vendor=0x11ab, dev=0x4362, revid=0x22
 Feb 28 18:27:07 zim kernel: domain=0, bus=4, slot=0, func=0
 Feb 28 18:27:07 zim kernel: class=02-00-00, hdrtype=0x00, mfdev=0
 Feb 28 18:27:07 zim kernel: cmdreg=0x0007, statreg=0x0010, cachelnsz=1 (dwords)
 Feb 28 18:27:07 zim kernel: lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
 Feb 28 18:27:07 zim kernel: intpin=a, irq=17
 Feb 28 18:27:07 zim kernel: powerspec 2  supports D0 D1 D2 D3  current D0
 Feb 28 18:27:07 zim kernel: MSI supports 2 messages, 64 bit
 Feb 28 18:27:07 zim kernel: pci0:4:0:0: reprobing on driver added
 Feb 28 18:27:07 zim kernel: mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xae00-0xaeff mem 0xfdefc000-0xfdefffff irq 17 at device 0.0 on pci4
 Feb 28 18:27:07 zim kernel: pcib4: mskc0 requested memory range 0xfdefc000-0xfdefffff: good
 Feb 28 18:27:07 zim kernel: mskc0: MSI count : 2
 Feb 28 18:27:07 zim kernel: mskc0: attempting to allocate 2 MSI vectors (2 supported)
 Feb 28 18:27:07 zim kernel: msi: routing MSI IRQ 256 to vector 58
 Feb 28 18:27:07 zim kernel: msi: routing MSI IRQ 257 to vector 59
 Feb 28 18:27:07 zim kernel: mskc0: using IRQs 256-257 for MSI
 Feb 28 18:27:07 zim kernel: mskc0: RAM buffer size : 48KB
 Feb 28 18:27:07 zim kernel: mskc0: Port 0 : Rx Queue 32KB(0x00000000:0x00007fff)
 Feb 28 18:27:07 zim kernel: mskc0: Port 0 : Tx Queue 16KB(0x00008000:0x0000bfff)
 Feb 28 18:27:07 zim kernel: msi: Assigning MSI IRQ 256 to local APIC 0
 Feb 28 18:27:07 zim kernel: mskc0: [MPSAFE]
 Feb 28 18:27:07 zim kernel: mskc0: [FILTER]
 Feb 28 18:27:07 zim kernel: pci5: driver added
 Feb 28 18:27:07 zim kernel: msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
 Feb 28 18:27:07 zim kernel: msk0: bpf attached
 Feb 28 18:27:07 zim kernel: msk0: Ethernet address: 00:01:29:a3:3c:a3
 Feb 28 18:27:07 zim kernel: miibus0: <MII bus> on msk0
 Feb 28 18:27:07 zim kernel: e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
 Feb 28 18:27:07 zim kernel: e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
 Feb 28 18:27:07 zim kernel: msk0: link state changed to DOWN
 Feb 28 18:27:09 zim kernel: msk0: link state changed to UP

From: Pyun YongHyeon <pyunyh@gmail.com>
To: David Schultz <das@FreeBSD.ORG>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Fri, 29 Feb 2008 12:44:08 +0900

 On Thu, Feb 28, 2008 at 06:32:39PM -0500, David Schultz wrote:
  > On Mon, Feb 25, 2008, Pyun YongHyeon wrote:
  > > On Sun, Feb 24, 2008 at 12:57:05PM -0500, David Schultz wrote:
  > >  > On Thu, Feb 21, 2008, Pyun YongHyeon wrote:
  > >  > >  > Is there any other diagnostic info that would help? As I said,
  > >  > >  > once it wedges, the total interrupt count in vmstat -i stops
  > >  > >  > increasing. Maybe the watchdog timeout code isn't reenabling
  > >  > >  > interrupts on the card properly or something...
  > >  > > 
  > >  > > I can't still reproduce it but would you try attached patch?
  > >  > 
  > >  > I've been running with the patch for 2 days now, and it hasn't
  > >  > hanged yet, so it seems like this fixed the problem! I'll keep
  > >  > exercising it for a few more days and let you know if there are
  > >  > any further problems.
  > > 
  > > Thanks for testing. I'll wait one more week and commit the patch.
  > > If you encounter the issue again please let me know asap.
  > 
  > Sigh, it happened again. This time the interface was mostly idle, too:
  > 
  > Feb 28 12:10:27 zim kernel: msk0: watchdog timeout
  > Feb 28 12:10:27 zim kernel: msk0: link state changed to DOWN
  > Feb 28 12:10:29 zim kernel: msk0: link state changed to UP
  > Feb 28 12:10:41 zim kernel: msk0: watchdog timeout (missed Tx interrupts) -- rec
  > overing
  > Feb 28 12:11:22 zim last message repeated 4 times
  > Feb 28 12:12:51 zim last message repeated 6 times
  > Feb 28 12:13:14 zim kernel: msk0: watchdog timeout
  > Feb 28 12:13:14 zim kernel: msk0: link state changed to DOWN
  > Feb 28 12:13:17 zim kernel: msk0: link state changed to UP
  > Feb 28 12:13:25 zim kernel: msk0: watchdog timeout (missed Tx interrupts) -- rec
  > overing
  > Feb 28 12:13:54 zim last message repeated 3 times
  > Feb 28 12:15:09 zim last message repeated 5 times
  > 
  > This time I did not compile the msk driver into the kernel.
  > Unloading and then reloading the module fixed the problem without
  > rebooting. (Kernel message log below in case it is somehow
  > useful.)
  > 
  > Feb 28 18:26:55 zim kernel: e1000phy0: detached
  > Feb 28 18:26:55 zim kernel: miibus0: detached
  > Feb 28 18:26:55 zim kernel: msk0: detached
  > Feb 28 18:26:55 zim kernel: mskc0: detached
  > Feb 28 18:27:07 zim kernel: pci0: driver added
  > Feb 28 18:27:07 zim kernel: found->     vendor=0x8086, dev=0x2930, revid=0x02
  > Feb 28 18:27:07 zim kernel: domain=0, bus=0, slot=31, func=3
  > Feb 28 18:27:07 zim kernel: class=0c-05-00, hdrtype=0x00, mfdev=0
  > Feb 28 18:27:07 zim kernel: cmdreg=0x0003, statreg=0x0280, cachelnsz=0 (dwords)
  > Feb 28 18:27:07 zim kernel: lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
  > Feb 28 18:27:07 zim kernel: intpin=b, irq=18
  > Feb 28 18:27:07 zim kernel: pci0:0:31:3: reprobing on driver added
  > Feb 28 18:27:07 zim kernel: pci1: driver added
  > Feb 28 18:27:07 zim kernel: pci2: driver added
  > Feb 28 18:27:07 zim kernel: pci3: driver added
  > Feb 28 18:27:07 zim kernel: pci4: driver added
  > Feb 28 18:27:07 zim kernel: found->     vendor=0x11ab, dev=0x4362, revid=0x22
  > Feb 28 18:27:07 zim kernel: domain=0, bus=4, slot=0, func=0
  > Feb 28 18:27:07 zim kernel: class=02-00-00, hdrtype=0x00, mfdev=0
  > Feb 28 18:27:07 zim kernel: cmdreg=0x0007, statreg=0x0010, cachelnsz=1 (dwords)
  > Feb 28 18:27:07 zim kernel: lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
  > Feb 28 18:27:07 zim kernel: intpin=a, irq=17
  > Feb 28 18:27:07 zim kernel: powerspec 2  supports D0 D1 D2 D3  current D0
  > Feb 28 18:27:07 zim kernel: MSI supports 2 messages, 64 bit
  > Feb 28 18:27:07 zim kernel: pci0:4:0:0: reprobing on driver added
  > Feb 28 18:27:07 zim kernel: mskc0: <Marvell Yukon 88E8053 Gigabit Ethernet> port 0xae00-0xaeff mem 0xfdefc000-0xfdefffff irq 17 at device 0.0 on pci4
  > Feb 28 18:27:07 zim kernel: pcib4: mskc0 requested memory range 0xfdefc000-0xfdefffff: good
  > Feb 28 18:27:07 zim kernel: mskc0: MSI count : 2
  > Feb 28 18:27:07 zim kernel: mskc0: attempting to allocate 2 MSI vectors (2 supported)
  > Feb 28 18:27:07 zim kernel: msi: routing MSI IRQ 256 to vector 58
  > Feb 28 18:27:07 zim kernel: msi: routing MSI IRQ 257 to vector 59
  > Feb 28 18:27:07 zim kernel: mskc0: using IRQs 256-257 for MSI
  > Feb 28 18:27:07 zim kernel: mskc0: RAM buffer size : 48KB
  > Feb 28 18:27:07 zim kernel: mskc0: Port 0 : Rx Queue 32KB(0x00000000:0x00007fff)
  > Feb 28 18:27:07 zim kernel: mskc0: Port 0 : Tx Queue 16KB(0x00008000:0x0000bfff)
  > Feb 28 18:27:07 zim kernel: msi: Assigning MSI IRQ 256 to local APIC 0
  > Feb 28 18:27:07 zim kernel: mskc0: [MPSAFE]
  > Feb 28 18:27:07 zim kernel: mskc0: [FILTER]
  > Feb 28 18:27:07 zim kernel: pci5: driver added
  > Feb 28 18:27:07 zim kernel: msk0: <Marvell Technology Group Ltd. Yukon EC Id 0xb6 Rev 0x02> on mskc0
  > Feb 28 18:27:07 zim kernel: msk0: bpf attached
  > Feb 28 18:27:07 zim kernel: msk0: Ethernet address: 00:01:29:a3:3c:a3
  > Feb 28 18:27:07 zim kernel: miibus0: <MII bus> on msk0
  > Feb 28 18:27:07 zim kernel: e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 0 on miibus0
  > Feb 28 18:27:07 zim kernel: e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
  > Feb 28 18:27:07 zim kernel: msk0: link state changed to DOWN
  > Feb 28 18:27:09 zim kernel: msk0: link state changed to UP
 
 Hmm, I guess this one is not related with your previous bug
 report. From the above boot messages I think msk(4) is using MSI.
 How about disabling MSI by setting hw.msk.msi_disable tunable
 to 1?
 
 -- 
 Regards,
 Pyun YongHyeon

From: David Schultz <das@FreeBSD.ORG>
To: Pyun YongHyeon <pyunyh@gmail.com>
Cc: Hugo <hugo@barafranca.com>, yongari@FreeBSD.ORG,
        freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/115882: msk driver always fails under moderate network load.
Date: Mon, 17 Mar 2008 01:45:04 -0400

 On Fri, Feb 29, 2008, Pyun YongHyeon wrote:
 > Hmm, I guess this one is not related with your previous bug
 > report. From the above boot messages I think msk(4) is using MSI.
 > How about disabling MSI by setting hw.msk.msi_disable tunable
 > to 1?
 
 I set hw.pci.enable_msi and hw.pci.enable_msix to 0, and no
 problems for the last two weeks...
State-Changed-From-To: open->closed 
State-Changed-By: yongari 
State-Changed-When: Tue Mar 18 01:46:06 UTC 2008 
State-Changed-Why:  
Work around for a bug that caused hardware hang was MFCed to RELENG_7 
and RELENG_6. Close this PR as work around for watchdog timeout on 
MacBook is available as a tunable. 
Thanks for testing! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=115882 

From: "Mars G Miro" <spry@anarchy.in.the.ph>
To: bug-followup@FreeBSD.org, Hugo <hugo@barafranca.com>
Cc: "Pyun YongHyeon" <pyunyh@gmail.com>, "David Schultz" <das@FreeBSD.ORG>
Subject: Re: kern/115882: [msk] msk driver always fails under moderate network load.
Date: Wed, 21 May 2008 12:15:17 +0800

 Greetz,
 
      I'm seeing this problem too, in one of my boxens. It is a Nexgate
 NSA1086 (an older
 http://www.nexcom.com/ProductModel.aspx?id=d791325b-f791-4fc3-b98b-a482637c72e3)
 that has 4 msk and 4 sk NICs. Its docs say the msk is connected to its
 PCI-e bus while the sk are connected to its PCI-X bus. The box runs on
 RELENG_7 (csup'd 20080409) and even has 1.18.2.11 of if_msk.c. I've
 tried the patch at
 http://people.freebsd.org/~yongari/msk/msk.pcierr.patch but to no
 avail. Even disabling hw.pci.enable_msix and hw.pci.enable_msi before
 & after the msk.pcierr.patch doesn't help. pciconf -lvc tells me:
 
   mskc0@pci0:1:0:0:       class=0x020000 card=0x522111ab
 chip=0x436211ab rev=0x15 hdr=0x00
     vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
     device     = '88E8053 Marvell Yukon 88E8053 PCI-E Gigabit Ethernet
 Controller'
     class      = network
     subclass   = ethernet
     cap 01[48] = powerspec 2  supports D0 D1 D2 D3  current D0
     cap 03[50] = VPD
     cap 05[5c] = MSI supports 2 messages, 64 bit enabled with 2 messages
     cap 10[e0] = PCI-Express 1 legacy endpoint
 
      The quickest way I could reproduce the problem is to run an iperf
 server on this box and from another box, fire up an iperf client
 sending 200G of data. In about ~ 2 hours, the NIC becomes unusable and
 no amount of ifconfig up/down can help it, forcing me to reboot. The
 odd thing is that in my test setup the iperf client is also an msk (a
 Gigabyte GA-965P mobo) and doesn't have problems at all. I am willing
 to test patches.
 
      Thanks.
 
 
 cheers
 mars

From: "Mars G Miro" <spry@anarchy.in.the.ph>
To: bug-followup@freebsd.org, Hugo <hugo@barafranca.com>, 
	"Pyun YongHyeon" <pyunyh@gmail.com>
Cc: "David Schultz" <das@freebsd.org>, yongari@FreeBSD.org, 
	"Kudo Chien" <ckchien@gmail.com>
Subject: Re: kern/115882: [msk] msk driver always fails under moderate network load.
Date: Fri, 23 May 2008 13:56:37 +0800

 On Wed, May 21, 2008 at 12:15 PM, Mars G Miro <spry@anarchy.in.the.ph> wrote:
 > Greetz,
 >
 >     I'm seeing this problem too, in one of my boxens. It is a Nexgate
 > NSA1086 (an older
 > http://www.nexcom.com/ProductModel.aspx?id=d791325b-f791-4fc3-b98b-a482637c72e3)
 > that has 4 msk and 4 sk NICs. Its docs say the msk is connected to its
 > PCI-e bus while the sk are connected to its PCI-X bus. The box runs on
 > RELENG_7 (csup'd 20080409) and even has 1.18.2.11 of if_msk.c. I've
 > tried the patch at
 > http://people.freebsd.org/~yongari/msk/msk.pcierr.patch but to no
 > avail. Even disabling hw.pci.enable_msix and hw.pci.enable_msi before
 > & after the msk.pcierr.patch doesn't help. pciconf -lvc tells me:
 >
 >  mskc0@pci0:1:0:0:       class=0x020000 card=0x522111ab
 > chip=0x436211ab rev=0x15 hdr=0x00
 >    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
 >    device     = '88E8053 Marvell Yukon 88E8053 PCI-E Gigabit Ethernet
 > Controller'
 >    class      = network
 >    subclass   = ethernet
 >    cap 01[48] = powerspec 2  supports D0 D1 D2 D3  current D0
 >    cap 03[50] = VPD
 >    cap 05[5c] = MSI supports 2 messages, 64 bit enabled with 2 messages
 >    cap 10[e0] = PCI-Express 1 legacy endpoint
 >
 >     The quickest way I could reproduce the problem is to run an iperf
 > server on this box and from another box, fire up an iperf client
 > sending 200G of data. In about ~ 2 hours, the NIC becomes unusable and
 > no amount of ifconfig up/down can help it, forcing me to reboot. The
 > odd thing is that in my test setup the iperf client is also an msk (a
 > Gigabyte GA-965P mobo) and doesn't have problems at all. I am willing
 > to test patches.
 >
 
 
 I guess i came to my conclusions a wee bit early on the GA-965P mobo's
 msk NIC. It has the same problems ( I was rsync'ing an /usr/obj
 (3.1G)) and this problem manifested itself. It's interesting that when
 I was using it as an iperf client (as describe above) it didnt have
 the problem at all. Another guy having a similar mobo filed a PR
 #kern/116853. He's CC'd.
 
 Thanks.
 
 >     Thanks.
 >
 >
 
 
 cheers
 mars

From: Pyun YongHyeon <pyunyh@gmail.com>
To: Mars G Miro <spry@anarchy.in.the.ph>
Cc: bug-followup@freebsd.org, Hugo <hugo@barafranca.com>,
        David Schultz <das@freebsd.org>, yongari@freebsd.org,
        Kudo Chien <ckchien@gmail.com>
Subject: Re: kern/115882: [msk] msk driver always fails under moderate network load.
Date: Fri, 23 May 2008 15:19:54 +0900

 On Fri, May 23, 2008 at 01:56:37PM +0800, Mars G Miro wrote:
  > On Wed, May 21, 2008 at 12:15 PM, Mars G Miro <spry@anarchy.in.the.ph> wrote:
  > > Greetz,
  > >
  > >     I'm seeing this problem too, in one of my boxens. It is a Nexgate
  > > NSA1086 (an older
  > > http://www.nexcom.com/ProductModel.aspx?id=d791325b-f791-4fc3-b98b-a482637c72e3)
  > > that has 4 msk and 4 sk NICs. Its docs say the msk is connected to its
  > > PCI-e bus while the sk are connected to its PCI-X bus. The box runs on
  > > RELENG_7 (csup'd 20080409) and even has 1.18.2.11 of if_msk.c. I've
  > > tried the patch at
  > > http://people.freebsd.org/~yongari/msk/msk.pcierr.patch but to no
  > > avail. Even disabling hw.pci.enable_msix and hw.pci.enable_msi before
  > > & after the msk.pcierr.patch doesn't help. pciconf -lvc tells me:
  > >
  > >  mskc0@pci0:1:0:0:       class=0x020000 card=0x522111ab
  > > chip=0x436211ab rev=0x15 hdr=0x00
  > >    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
  > >    device     = '88E8053 Marvell Yukon 88E8053 PCI-E Gigabit Ethernet
  > > Controller'
  > >    class      = network
  > >    subclass   = ethernet
  > >    cap 01[48] = powerspec 2  supports D0 D1 D2 D3  current D0
  > >    cap 03[50] = VPD
  > >    cap 05[5c] = MSI supports 2 messages, 64 bit enabled with 2 messages
  > >    cap 10[e0] = PCI-Express 1 legacy endpoint
  > >
  > >     The quickest way I could reproduce the problem is to run an iperf
  > > server on this box and from another box, fire up an iperf client
  > > sending 200G of data. In about ~ 2 hours, the NIC becomes unusable and
  > > no amount of ifconfig up/down can help it, forcing me to reboot. The
  > > odd thing is that in my test setup the iperf client is also an msk (a
  > > Gigabyte GA-965P mobo) and doesn't have problems at all. I am willing
  > > to test patches.
  > >
  > 
  > 
  > I guess i came to my conclusions a wee bit early on the GA-965P mobo's
  > msk NIC. It has the same problems ( I was rsync'ing an /usr/obj
  > (3.1G)) and this problem manifested itself. It's interesting that when
  > I was using it as an iperf client (as describe above) it didnt have
  > the problem at all. Another guy having a similar mobo filed a PR
  > #kern/116853. He's CC'd.
  > 
 
 I guess the problem you've encountered has nothing to do with this
 PR. So it would be even better if you can open another PR for this
 issue.
 I think I fixed hardware hang issue of 88E8053 but your case still
 indicates the problem wasn't coverved by the workaround. In order
 to verify whether you are seeing the same kind of hardware bug,
 
 1. Can you check whether msk(4) received flow-control frames from
    sender? Since msk(4) have no hardware counter support yet you
    may have to check statistics of sender or swtich.
 
 2. When msk(4) is not responding, can you send packets to other
    hosts via msk(4)?  Another check point would be whether you
    can still see incoming packets with tcpdump on msk(4).
 
 3. Show me the output of 'ifconfig msk0 output' when msk(4) is not
    responding.
 
 4. When msk(4) is not respondig, can you check msk(4) is still
    generating interrupts?(Check the output of 'systat -vmstat 1').
 
 The PR 116853 is completely different one. His controller is not
 88E8053.
 
 -- 
 Regards,
 Pyun YongHyeon
>Unformatted:
