From nobody@FreeBSD.org  Sat Jan 18 18:58:08 2014
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTPS id E3A27E7D
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 18 Jan 2014 18:58:08 +0000 (UTC)
Received: from oldred.freebsd.org (oldred.freebsd.org [IPv6:2001:1900:2254:206a::50:4])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id C304F1783
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 18 Jan 2014 18:58:08 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id s0IIw7RT052517
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 18 Jan 2014 18:58:07 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id s0IIw7Ej052467;
	Sat, 18 Jan 2014 18:58:07 GMT
	(envelope-from nobody)
Message-Id: <201401181858.s0IIw7Ej052467@oldred.freebsd.org>
Date: Sat, 18 Jan 2014 18:58:07 GMT
From: Eric Dombroski <eric@edombroski.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Major performance/stability regression in virtio network drivers between 9.2-RELEASE and 10.0-RC5
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         185864
>Category:       kern
>Synopsis:       [virtio] Major performance/stability regression in virtio network drivers between 9.2-RELEASE and 10.0-RC5
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bryanv
>State:          feedback
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 18 19:00:00 UTC 2014
>Closed-Date:    
>Last-Modified:  Sat May  3 17:00:00 UTC 2014
>Originator:     Eric Dombroski
>Release:        10.0RC5
>Organization:
>Environment:
FreeBSD umaro 10.0-RC5 FreeBSD 10.0-RC5 #0 r260430: Wed Jan  8 05:10:04 UTC 2014     root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
 believe there is a major performance regression between FreeBSD 9.2-RELEASE and 10.0-RC5 involving the virtio network drivers (vtnet) and handling incoming traffic.  Below are the results of some iperf tests and large dd operations over NFS.  Write throughput goes from ~40Gbps to ~2.4Gbps from 9.2 to 10.0RC5, and over time the connection becomes unstable ("no buffer space available"), requiring the interface to be taken down/up.  

These results are on fresh installs of 9.2 and 10.0RC5, no sysctl tweaks on either system.

I can't reproduce this using an Intel 1Gbps ethernet through PCIe passthrough, although I suspect the problem manifests itself over 1Gbps speeds anyway.

Tests: 

Client (host):
  root@gogo:~# uname -a
  Linux gogo 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
  root@gogo:~# kvm -version
  QEMU emulator version 1.1.2 (qemu-kvm-1.1.2+dfsg-6, Debian), Copyright (c) 2003-2008 Fabrice Bellard
  root@gogo:~# lsmod | grep vhost
  vhost_net              27436  3
  tun                    18337  8 vhost_net
  macvtap                17633  1 vhost_net
  

  Command: iperf -c 192.168.100.x -t 60


Server (FreeBSD 9.2 VM):

      root@umarotest:~ # uname -a
      FreeBSD umarotest 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0: Sat Jan 11 03:25:02 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
      root@umarotest:~ # iperf -s
      ------------------------------------------------------------
      Server listening on TCP port 5001
      TCP window size: 64.0 KByte (default)
      ------------------------------------------------------------
      [  4] local 192.168.100.44 port 5001 connected with 192.168.100.1 port 58996
      [ ID] Interval       Transfer     Bandwidth
      [  4]  0.0-60.0 sec   293 GBytes  41.9 Gbits/sec
      [  5] local 192.168.100.44 port 5001 connected with 192.168.100.1 port 58997
      [  5]  0.0-60.0 sec   297 GBytes  42.5 Gbits/sec
      [  4] local 192.168.100.44 port 5001 connected with 192.168.100.1 port 58998
      [  4]  0.0-60.0 sec   291 GBytes  41.6 Gbits/sec
      [  5] local 192.168.100.44 port 5001 connected with 192.168.100.1 port 58999
      [  5]  0.0-60.0 sec   297 GBytes  42.6 Gbits/sec
      [  4] local 192.168.100.44 port 5001 connected with 192.168.100.1 port 59000
      [  4]  0.0-60.0 sec   297 GBytes  42.5 Gbits/sec

      While pinging out from the server to the client, I do not get any errors.


      root@umaro:~ # uname -a FreeBSD umaro 10.0-RC5 FreeBSD 10.0-RC5 #0 r260430: Wed Jan  8 05:10:04 UTC 2014     root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
      root@umaro:~ # iperf -s
      ------------------------------------------------------------
      Server listening on TCP port 5001
      TCP window size: 64.0 KByte (default)
      ------------------------------------------------------------
      [  4] local 192.168.100.5 port 5001 connected with 192.168.100.1 port 50264
      [ ID] Interval       Transfer     Bandwidth
      [  4]  0.0-60.0 sec  16.7 GBytes  2.39 Gbits/sec
      [  5] local 192.168.100.5 port 5001 connected with 192.168.100.1 port 50265
      [  5]  0.0-60.0 sec  18.3 GBytes  2.62 Gbits/sec
      [  4] local 192.168.100.5 port 5001 connected with 192.168.100.1 port 50266
      [  4]  0.0-60.0 sec  16.8 GBytes  2.40 Gbits/sec
      [  5] local 192.168.100.5 port 5001 connected with 192.168.100.1 port 50267
      [  5]  0.0-60.0 sec  16.8 GBytes  2.40 Gbits/sec
      [  4] local 192.168.100.5 port 5001 connected with 192.168.100.1 port 50268
      [  4]  0.0-60.0 sec  16.8 GBytes  2.41 Gbits/sec

      *** While pinging out from the server to client, frequent "ping: sendto: No space left on device" errors ***


      After a while, I can also reliably re-produce more egregious "ping: sendto: No buffer space available" errors after doing a large sequential write over NFS:

      mount -t nfs -o rsize=65536,wsize=65536 192.168.100.5:/storage/shared /mnt/nfs
      dd if=/dev/zero of=/mnt/nfs/testfile bs=1M count=30000




>How-To-Repeat:
Perform iperf test between a KVM host and the FreeBSD 9.2 VM as a server using virtio network drivers while pinging out from FreeBSD system; compare to 10.0RC5...results go from ~40Gbps to ~2.4Gbps and the 10.0RC5 system shows many "no space left on device" errors during pings.

Perform large sequential write (dd, mysqldump, etc.) operation over NFS, eventually interface goes down with "no buffer space available" errors during ping.  Only remediation at that point is to take down the interface and take it up again.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-amd64->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Jan 19 17:29:26 UTC 2014 
Responsible-Changed-Why:  
reclassify. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=185864 
Responsible-Changed-From-To: freebsd-bugs->bryanv 
Responsible-Changed-By: bryanv 
Responsible-Changed-When: Wed Jan 22 07:39:43 UTC 2014 
Responsible-Changed-Why:  


http://www.freebsd.org/cgi/query-pr.cgi?pr=185864 

From: Baptiste Jonglez <bjonglez@illyse.org>
To: bug-followup@FreeBSD.org, eric@edombroski.com
Cc:  
Subject: Re: kern/185864: [virtio] Major performance/stability regression in
 virtio network drivers between 9.2-RELEASE and 10.0-RC5
Date: Sun, 27 Apr 2014 21:59:52 +0200

 --MIdTMoZhcV1D07fI
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Hi,
 
 I am experiencing a similar issue with 10.0-RELEASE.  An amd64 FreeBSD
 guest (KVM) on an amd64 linux host has very low network performance using
 the virtio driver.
 
 
 When receiving data from the host:
 
 root@glados:~ # iperf -s
 ------------------------------------------------------------
 Server listening on TCP port 5001
 TCP window size: 64.0 KByte (default)
 ------------------------------------------------------------
 [  4] local 172.23.184.94 port 5001 connected with 172.23.184.126 port 34588
 [ ID] Interval       Transfer     Bandwidth
 [  4]  0.0-10.2 sec  99.1 MBytes  81.4 Mbits/sec
 
 
 When sending data to the host:
 
 root@glados:~ # iperf -c portal
 ------------------------------------------------------------
 Client connecting to portal, TCP port 5001
 TCP window size: 32.5 KByte (default)
 ------------------------------------------------------------
 [  3] local 172.23.184.94 port 50880 connected with 172.23.184.126 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec   815 MBytes   682 Mbits/sec
 
 
 Even if the CPU is a low-end AMD CPU (E-350D APU), the receive throughput
 seems a bit low.
 
 After seeing [1], I experimented with a static IPv4 address on the guest
 instead of a DHCP-assigned one.  The results are much better, see below:
 
 
 Receiving from the host:
 
 root@glados:~ # iperf -s
 ------------------------------------------------------------
 Server listening on TCP port 5001
 TCP window size: 64.0 KByte (default)
 ------------------------------------------------------------
 [  4] local 172.23.184.68 port 5001 connected with 172.23.184.126 port 49831
 [ ID] Interval       Transfer     Bandwidth
 [  4]  0.0-10.1 sec   346 MBytes   288 Mbits/sec
 
 
 Sending to the host:
 
 root@glados:~ # iperf -c portal
 ------------------------------------------------------------
 Client connecting to portal, TCP port 5001
 TCP window size: 40.5 KByte (default)
 ------------------------------------------------------------
 [  3] local 172.23.184.68 port 53975 connected with 172.23.184.126 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec  1.91 GBytes  1.64 Gbits/sec
 
 
 
 The guest system is a fresh install of 10.0-RELEASE:
 
 root@glados:~ # uname -a
 FreeBSD glados 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16
 22:34:59 UTC 2014 root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
 
 The interface has the following properties:
 
 vtnet0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1=
 500
         options=3D6c03bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VL=
 AN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         ether 52:54:00:e4:d9:8d
         inet 172.23.184.94 netmask 0xffffffc0 broadcast 172.23.184.127=20
         nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
         media: Ethernet 10Gbase-T <full-duplex>
         status: active
 
 The relevant options of the kvm invocation on the host are:
 
 /usr/bin/kvm -S -M pc-1.1 -cpu Opteron_G3 -enable-kvm -m 6000 -smp 1,socket=
 s=3D1,cores=3D1,threads=3D1 -rtc base=3Dutc,driftfix=3Dslew -no-kvm-pit-rei=
 njection -no-hpet -no-shutdown -netdev tap,fd=3D26,id=3Dhostnet0 -device vi=
 rtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D52:54:00:e4:d9:8d,bus=3Dpci.=
 0,addr=3D0x3
 
 The host is a Debian stable:
 
 root@portal:~# uname -a
 Linux portal 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
 
 
 Is there any way I can help troubleshoot this issue?
 
 Thanks,
 Baptiste
 
 [1] http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2014-01/msg00349=
 =2Ehtml
 
 --MIdTMoZhcV1D07fI
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAEBCAAGBQJTXWG4AAoJEGB4mfgsWgTWCFkP/19C/64jHobNyFRijgFZ50Mg
 yM5MgEtKBHoqZy8UWxQbOHAWaXs5NcJk+ZehThuCYRdobCjAUd5WCFyXi5AJqGpy
 K9ktRYqui4O/5u73uX6GEvIfvLf4SmTxE96TXNlKWtpsPKe/Hb7PCCRiAiQ7cMMk
 EypYoTbZLvfh5eA59N5YfNCPgvJs5VkEvYrnP/tpjFqe+DoMSNaf/B3pht2ahkr/
 pv4NErvCwHdfCde4JZiUg7MNYvtWHxu7sdbdEdGlA5rPM6BB0C7YP2I4OFjiSYwl
 fDUFXzAahTGREl3izbtHFm7c99k9qEXFYlXk8auiVqwPtGCk1mTDx+DEnZOeCbo2
 KLgwG51OJepkP/1P1eNJgkJs6N3aW3Q6Su09uT7s7qCTlsEJiGMiB18SfV5YAFdO
 /OhOHtvNajFzYUR6cxK1rHnkBaRmRH/ThQdMAChAWxaGLYColImM4ys0u+Wv6HXq
 Qcv/OZYY5ORkQQNWmsmXVSUSgpXJIfUhpQNkmfpy2iHZZpw5ssGRtuY0Fryh5jlT
 QenMrKeMlhyfNTY1venoff51PAK4u+LJ2WEz4q94EbiDfPDMYLY9IUc/wTCJqAAX
 deWzGO3qqO4/jeAkPgXLFWFjb/A6pCdyX6bboWoPJp4YLsCh9zXp1a4ryJOfuhDT
 VNMcvDvZBtIbojzokARD
 =wS+k
 -----END PGP SIGNATURE-----
 
 --MIdTMoZhcV1D07fI--
State-Changed-From-To: open->feedback 
State-Changed-By: bryanv 
State-Changed-When: Sat May 3 16:39:45 UTC 2014 
State-Changed-Why:  
Since you get much better performance with a static IP, that 

http://www.freebsd.org/cgi/query-pr.cgi?pr=185864 

From: Bryan Venteicher <bryanv@freebsd.org>
To: bug-followup@freebsd.org
Cc: Baptiste Jonglez <bjonglez@illyse.org>
Subject: Re: kern/185864: [virtio] Major performance/stability regression in
 virtio network drivers between 9.2-RELEASE and 10.0-RC5
Date: Sat, 3 May 2014 11:53:05 -0500

 --047d7b10ca552fe34b04f881bd4d
 Content-Type: text/plain; charset=UTF-8
 
 Since you get much better performance with a static IP, that
 likely means getting the timecounter within the BPF is taking
 an absurd time. This seems to only impact some flavors of
 QEMU on, IIRC, some hardware. There was a change in our BPF
 that times means sometimes we want to get a more accurate
 time stamp; this is typically very cheap, but is this situation
 it is not, leading to a very high cost per packet.
 
 You can try this workaround patch:
 
 diff --git a/sys/net/bpf.c b/sys/net/bpf.c
 index cb3ed27..9751986 100644
 --- a/sys/net/bpf.c
 +++ b/sys/net/bpf.c
 @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct
 mbuf *m)
                         return (BPF_TSTAMP_EXTERN);
                 }
         }
 +#if 0
         if (quality == BPF_TSTAMP_NORMAL)
                 binuptime(bt);
         else
 +#endif
                 getbinuptime(bt);
 
         return (quality);
 
 You can also enable LRO on the interface; that will likely make
 the iperf performance be closer to the send (which benefits from
 TSO).
 
 --047d7b10ca552fe34b04f881bd4d
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 <div dir=3D"ltr"><div>Since you get much better performance with a static I=
 P, that</div><div>likely means getting the timecounter within the BPF is ta=
 king</div><div>an absurd time. This seems to only impact some flavors of</d=
 iv>
 
 <div>QEMU on, IIRC, some hardware. There was a change in our BPF</div><div>=
 that times means sometimes we want to get a more accurate</div><div>time st=
 amp; this is typically very cheap, but is this situation</div><div>it is no=
 t, leading to a very high cost per packet.</div>
 
 <div><br></div><div>You can try this workaround patch:</div><div><br></div>=
 <div>diff --git a/sys/net/bpf.c b/sys/net/bpf.c</div><div>index cb3ed27..97=
 51986 100644</div><div>--- a/sys/net/bpf.c</div><div>+++ b/sys/net/bpf.c</d=
 iv>
 
 <div>@@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, str=
 uct mbuf *m)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (BPF_TSTAMP_EXTERN);</div><div>=C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }</div><div>=C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 }</div><div>+#if 0</div>
 <div>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (quality =3D=3D BPF_TSTAMP_NORMAL)</div><div=
 >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 binuptime(bt);</di=
 v><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else</div><div>+#endif</div><div>=C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 getbinuptime(bt);</div><di=
 v>=C2=A0</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (quality);</div>
 
 <div><br></div><div>You can also enable LRO on the interface; that will lik=
 ely make</div><div>the iperf performance be closer to the send (which benef=
 its from</div><div>TSO).</div></div>
 
 --047d7b10ca552fe34b04f881bd4d--
>Unformatted:
