From nobody@FreeBSD.org  Sat Feb 20 20:43:11 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 61E1C106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Feb 2010 20:43:11 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 50A5C8FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Feb 2010 20:43:11 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o1KKhB3U041956
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Feb 2010 20:43:11 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o1KKhBXO041955;
	Sat, 20 Feb 2010 20:43:11 GMT
	(envelope-from nobody)
Message-Id: <201002202043.o1KKhBXO041955@www.freebsd.org>
Date: Sat, 20 Feb 2010 20:43:11 GMT
From: "Emanuele A. Bagnaschi" <zephyrus.271@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [msk] Slow network performance and high system cpu load while transfering data
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         144148
>Category:       kern
>Synopsis:       [msk] Slow network performance and high system cpu load while transfering data
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    yongari
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Feb 20 20:50:00 UTC 2010
>Closed-Date:    Wed Mar 10 23:27:31 UTC 2010
>Last-Modified:  Wed Mar 10 23:27:31 UTC 2010
>Originator:     Emanuele A. Bagnaschi
>Release:        RELENG8
>Organization:
>Environment:
FreeBSD polaris 8.0-STABLE FreeBSD 8.0-STABLE #31: Fri Feb 19 18:55:27 CET 2010     toor@polaris:/usr/obj/usr/src/sys/POLARIS  amd64
>Description:
Hi,
I've been experiencing a troubling issue with a Marvell 8072 NIC on an HP
ProBook 4710s.
I first noticed that there is a problem with the NIC while transferring some files with scp to a FreeBSD8-STABLE server: CPUs usage sky-rocketed to 100% (system) and network performance was awful (about 1.8 MiB/s).

At first I reproduced the issue in a controlled way with 'ttcp' but I was informed on the FreeBSD stable mailing that it is better to use 'netperf' due to some problem with 'ttcp' threading code. Therefore I repeated the tests with netperf.
The tests show that the maximum achievable transfer rate is about 2 MiB/s, that disabling tso and txcsum (as suggested by Pyun YongHyeon on the FreeBSD stable mailing list) help a bit but does not resolve the problem and that the NIC works as expected on Linux (so I think that we can rule out that the problem is server side). The 'ttcp' tests were also originally executed with MSI interrupts disabled, again with no improvements.
All tests were run with no firewall on both systems.
You can find the full 'netperf' output attached.

Here it's some relevant information to identify the NIC:

first from 'dmesg'

mskc0: <Marvell Yukon 88E8072 Gigabit Ethernet> port 0x2000-0x20ff mem
0x90100000-0x90103fff irq 17 at device 0.0 on pci134
msk0: <Marvell Technology Group Ltd. Yukon EX Id 0xb5 Rev 0x02> on mskc0
msk0: Ethernet address: 00:25:b3:52:fc:fa
miibus0: <MII bus> on msk0
mskc0: [FILTER]
msk0: link state changed to DOWN
msk0: link state changed to UP

and then from 'pciconf -lv'

mskc0@pci0:134:0:0:     class=0x020000 card=0x3074103c chip=0x436c11ab
rev=0x10 hdr=0x00
    vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
    device     = 'Marvell 8072 Ethernet Nic (88E8072)'
    class      = network
    subclass   = ethernet

Here it's the output of 'ifconfig':

msk0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:25:b3:52:fc:fa
        inet 192.168.1.4 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>)
        satus: active


According to Pyun YongHyeon (again on the FreeBSD stable mailing lists), the NIC is a Yukon Extreme controller revision B0 and it is know that it has some silicon bugs.

I know that, without errata and data sheets, probably there is not much that can be done at the moment, but I thought that filing a PR would be at least a good way to point out the issue. Thanks for any help.

Best regards
Emanuele A. Bagnaschi
>How-To-Repeat:
Transfer data trough the network interface.
>Fix:


Patch attached with submission follows:

All netperf runs were executed with the following command line:

netperf -T TCP_SENDFILE -H hostname -f 'M'  -I 99,5 -i 10,2
--

* notebook - Linux 2.6.32 - hostname: elsinore *
* server - FreeBSD8-STABLE - hostname:atlantis * 

FIRST RUN - netperf executed on elsinore (Linux - Laptop)

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to atlantis (192.168.1.1) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    MBytes/sec  

 65536  16384  16384    10.04      11.19   


SECOND RUN - netperf executed on atlantis (FreeBSD - Server)

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to elsinore (192.168.1.4) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    MBytes/sec  

 87380  32768  32768    10.02      11.10 

-----

* notebook - FreeBSD8-STABLE - hostname:polaris - msk with tcxsum and tso disabled *
* server - FreeBSD8-STABLE - hostname:atlantis *


FIRST RUN - netperf executed on polaris (FreeBSD - laptop)

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to atlantis (192.168.1.1) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 74.033%
!!!                       Local CPU util  : 0.000%
!!!                       Remote CPU util : 0.000%

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    MBytes/sec  

 65536  32768  32768    11.44       2.23 


SECOND RUN - netperf executed on atlantis (FreeBSD - server)

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to elsinore (192.168.1.4) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 10.916%
!!!                       Local CPU util  : 0.000%
!!!                       Remote CPU util : 0.000%

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    MBytes/sec  

 65536  32768  32768    10.08       2.58  

-----

* notebook - FreeBSD8-STABLE - hostname:polaris - msk with txcsum and tso enabled *
* server - FreeBSD8-STABLE - hostname:atlantis *


FIRST RUN - polaris transmits, atlantis receives

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to atlantis (192.168.1.1) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
netperf: data send error: Connection reset by peer
len was -1

SECOND RUN - polaris receives, atlantis transmits

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to elsinore (192.168.1.4) port 0 AF_INET : +/-2.500% @ 99% conf.  : cpu bind
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 5.228%
!!!                       Local CPU util  : 0.000%
!!!                       Remote CPU util : 0.000%

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    MBytes/sec  

 65536  32768  32768    10.08       2.63


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-amd64->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Feb 21 00:03:02 UTC 2010 
Responsible-Changed-Why:  
reassign. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144148 
State-Changed-From-To: open->feedback 
State-Changed-By: yongari 
State-Changed-When: Mon Feb 22 02:01:24 UTC 2010 
State-Changed-Why:  
I remember there are stability issues on 88E8072 but it's somewhat 
hard to fix that mainly because I don't have the hardware to narrow 
down the issue. It seems you have two issues here, low performance 
number and high CPU usage during transfers. I have no idea how high 
CPU usage comes from msk(4) even if msk(4) gives poor performance. 
Would you show me the output of "sysctl dev.msk.0.stats"? And send 
captured traffics on receiver side(not on host with msk(4)) with 
tcpdump and send it to me. 


Responsible-Changed-From-To: freebsd-net->yongari 
Responsible-Changed-By: yongari 
Responsible-Changed-When: Mon Feb 22 02:01:24 UTC 2010 
Responsible-Changed-Why:  
Grab. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144148 

From: Zephyrus <zephyrus.271@gmail.com>
To: bug-followup@FreeBSD.org, zephyrus.271@gmail.com
Cc:  
Subject: Re: kern/144148: [msk] Slow network performance and high system cpu 
	load while transfering data
Date: Mon, 22 Feb 2010 20:18:00 +0100

 This is the output of ' sysctl dev.msk.0.stats' .
 I have also sent the captured traffic with 'tcpdump'   to your email address.
 
 dev.msk.0.stats.rx.ucast_frames: 524574
 dev.msk.0.stats.rx.bcast_frames: 1374
 dev.msk.0.stats.rx.pause_frames: 0
 dev.msk.0.stats.rx.mcast_frames: 0
 dev.msk.0.stats.rx.crc_errs: 0
 dev.msk.0.stats.rx.good_octets: 43912137
 dev.msk.0.stats.rx.bad_octets: 0
 dev.msk.0.stats.rx.frames_64: 952
 dev.msk.0.stats.rx.frames_65_127: 518319
 dev.msk.0.stats.rx.frames_128_255: 1079
 dev.msk.0.stats.rx.frames_256_511: 610
 dev.msk.0.stats.rx.frames_512_1023: 798
 dev.msk.0.stats.rx.frames_1024_1518: 4190
 dev.msk.0.stats.rx.frames_1519_max: 0
 dev.msk.0.stats.rx.frames_too_long: 0
 dev.msk.0.stats.rx.jabbers: 0
 dev.msk.0.stats.rx.overflows: 0
 dev.msk.0.stats.tx.ucast_frames: 780783
 dev.msk.0.stats.tx.bcast_frames: 4
 dev.msk.0.stats.tx.pause_frames: 8361
 dev.msk.0.stats.tx.mcast_frames: 0
 dev.msk.0.stats.tx.octets: 1154509263
 dev.msk.0.stats.tx.frames_64: 8558
 dev.msk.0.stats.tx.frames_65_127: 6620
 dev.msk.0.stats.tx.frames_128_255: 426
 dev.msk.0.stats.tx.frames_256_511: 454
 dev.msk.0.stats.tx.frames_512_1023: 23063
 dev.msk.0.stats.tx.frames_1024_1518: 750027
 dev.msk.0.stats.tx.frames_1519_max: 0
 dev.msk.0.stats.tx.colls: 0
 dev.msk.0.stats.tx.late_colls: 0
 dev.msk.0.stats.tx.excess_colls: 0
 dev.msk.0.stats.tx.multi_colls: 0
 dev.msk.0.stats.tx.single_colls: 0
 dev.msk.0.stats.tx.underflows: 0
 
 Best regards,
 
 Emanuele A. Bagnaschi

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144148: commit references a PR
Date: Wed,  3 Mar 2010 17:57:01 +0000 (UTC)

 Author: yongari
 Date: Wed Mar  3 17:56:52 2010
 New Revision: 204647
 URL: http://svn.freebsd.org/changeset/base/204647
 
 Log:
   Remove programming LED register and enable 25MHz TX clock for
   88E1149 PHY. This will fix intermittent watchdog timeouts as well
   as very slow network performance on 88E8072 Yukon Extreme.
   
   PR:	kern/144148
   MFC after:	1 week
 
 Modified:
   head/sys/dev/mii/e1000phy.c
 
 Modified: head/sys/dev/mii/e1000phy.c
 ==============================================================================
 --- head/sys/dev/mii/e1000phy.c	Wed Mar  3 17:55:51 2010	(r204646)
 +++ head/sys/dev/mii/e1000phy.c	Wed Mar  3 17:56:52 2010	(r204647)
 @@ -276,7 +276,6 @@ e1000phy_reset(struct mii_softc *sc)
  	case MII_MODEL_MARVELL_E1118:
  		break;
  	case MII_MODEL_MARVELL_E1116:
 -	case MII_MODEL_MARVELL_E1149:
  		page = PHY_READ(sc, E1000_EADR);
  		/* Select page 3, LED control register. */
  		PHY_WRITE(sc, E1000_EADR, 3);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: feedback->patched 
State-Changed-By: yongari 
State-Changed-When: Wed Mar 3 18:13:18 UTC 2010 
State-Changed-Why:  
Submitter confirms r204647 fixed the issue. 
Thanks for testing! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144148 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144148: commit references a PR
Date: Wed, 10 Mar 2010 22:21:26 +0000 (UTC)

 Author: yongari
 Date: Wed Mar 10 22:21:07 2010
 New Revision: 204985
 URL: http://svn.freebsd.org/changeset/base/204985
 
 Log:
   MFC r204647:
     Remove programming LED register and enable 25MHz TX clock for
     88E1149 PHY. This will fix intermittent watchdog timeouts as well
     as very slow network performance on 88E8072 Yukon Extreme.
   
     PR:	kern/144148
 
 Modified:
   stable/8/sys/dev/mii/e1000phy.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/dev/mii/e1000phy.c
 ==============================================================================
 --- stable/8/sys/dev/mii/e1000phy.c	Wed Mar 10 22:10:36 2010	(r204984)
 +++ stable/8/sys/dev/mii/e1000phy.c	Wed Mar 10 22:21:07 2010	(r204985)
 @@ -276,7 +276,6 @@ e1000phy_reset(struct mii_softc *sc)
  	case MII_MODEL_MARVELL_E1118:
  		break;
  	case MII_MODEL_MARVELL_E1116:
 -	case MII_MODEL_MARVELL_E1149:
  		page = PHY_READ(sc, E1000_EADR);
  		/* Select page 3, LED control register. */
  		PHY_WRITE(sc, E1000_EADR, 3);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144148: commit references a PR
Date: Wed, 10 Mar 2010 22:24:14 +0000 (UTC)

 Author: yongari
 Date: Wed Mar 10 22:23:55 2010
 New Revision: 204986
 URL: http://svn.freebsd.org/changeset/base/204986
 
 Log:
   MFC r204647:
     Remove programming LED register and enable 25MHz TX clock for
     88E1149 PHY. This will fix intermittent watchdog timeouts as well
     as very slow network performance on 88E8072 Yukon Extreme.
   
     PR:	kern/144148
 
 Modified:
   stable/7/sys/dev/mii/e1000phy.c
 Directory Properties:
   stable/7/sys/   (props changed)
   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
   stable/7/sys/contrib/dev/acpica/   (props changed)
   stable/7/sys/contrib/pf/   (props changed)
 
 Modified: stable/7/sys/dev/mii/e1000phy.c
 ==============================================================================
 --- stable/7/sys/dev/mii/e1000phy.c	Wed Mar 10 22:21:07 2010	(r204985)
 +++ stable/7/sys/dev/mii/e1000phy.c	Wed Mar 10 22:23:55 2010	(r204986)
 @@ -275,7 +275,6 @@ e1000phy_reset(struct mii_softc *sc)
  	case MII_MODEL_MARVELL_E1118:
  		break;
  	case MII_MODEL_MARVELL_E1116:
 -	case MII_MODEL_MARVELL_E1149:
  		page = PHY_READ(sc, E1000_EADR);
  		/* Select page 3, LED control register. */
  		PHY_WRITE(sc, E1000_EADR, 3);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: yongari 
State-Changed-When: Wed Mar 10 23:26:58 UTC 2010 
State-Changed-Why:  
MFC to stable/8 and stable/7 done. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144148 
>Unformatted:
