From nobody@FreeBSD.org  Mon Oct 28 11:01:08 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTP id 7301888F
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 28 Oct 2013 11:01:08 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id 3A98723CF
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 28 Oct 2013 11:01:08 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r9SB17e5099859
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 28 Oct 2013 11:01:07 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r9SB17BH099851;
	Mon, 28 Oct 2013 11:01:07 GMT
	(envelope-from nobody)
Message-Id: <201310281101.r9SB17BH099851@oldred.freebsd.org>
Date: Mon, 28 Oct 2013 11:01:07 GMT
From: Antal Pataki <pataki.antal@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: 10gigabit networking problems
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         183390
>Category:       kern
>Synopsis:       [ixgbe] 10gigabit networking problems
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    jfv
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 28 11:10:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Mon Apr 28 06:10:00 UTC 2014
>Originator:     Antal Pataki
>Release:        9.2
>Organization:
Granaglia Ltd.
>Environment:
FreeBSD storagex.lan.granaglia.com 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: Thu Sep 26 22:50:31 UTC 2013     root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
Hardware: IBM x3500 m4 (2x E5-2620, 16GB RAM)
Intel X520 DA2 10Gbit NIC (PCI-Express x8)
IBM ServeRAID M1115 with 8x600GB 15k rpm SAS disk.

System setup:
The system is installed into a geli'ed zpool.

The Intel 10Gbit NIC is direct-connected to an other IBM x3500 m4 (same
Intel card too) what is running VmWare ESXi 5.5.

The system provides an NFS share to the ESXi system trough the 10 gigabit
connection.

The problem:

Without any load if I ping the other machine trough the 10 gigabit
connection, the ping output is like this:

root@storagex:~ # ping 10.3.3.2
PING 10.3.3.2 (10.3.3.2): 56 data bytes
(...cutoff...)
64 bytes from 10.3.3.2: icmp_seq=89 ttl=64 time=0.106ms
ping: sendto: File too large
64 bytes from 10.3.3.2: icmp_seq=91 ttl=64 time=0.092ms
..etc..etc.

Sometimes the "ping: sendto: File too large" message don't coming for
many hours, sometimes its floods the console!

When this starts to happens, the other end, the ESXi machine shows in
the logs, the StorageApdHandler process starts a times for the NFS share,
because it didn't receives back the NFS heartbeat.

After a few seconds, the ESXi machine starts to show in the lock:

NFSLock: xxx: Stop accessing fd 0xxxxxxx x

After a few seconds again, on the ESXi machine, the StorageApd Handler
enters the NFS share to All Path Down state, and drops the NFS connection.

After this, if I try to ping the FreeBSD machine from the ESXi machine,
the ESXi show "host is down", and on the FreeBSD machine the ping repeats
the "ping: sendto: File too large" message.

To resolve this, only ifconfig ix1 down and after ifconfig ix1 up works.

After resetting the interface like this, sometimes the connection and
the ping works for minutes, sometimes works for hours - and again
starting the situation described above.

I have screenshoots from the "ping: sendto: File too large" message.

We tried the default ixgbe driver, and the newest from the Intel's
website.  With both drives is the same issue.

We analysed that, if the transfer rate over the 10Gbit connection reaches
over 5Gbit/sec, the problem comes more faster, maybe in 20-40 minutes,
sometimes after 5 minutes.

If we leave the machine only to ping the each other, sometimes the problem
didn't come for days, but come.

>How-To-Repeat:
Install an Intel X520 10gbit NIC into a FreeBSD 9.2 system.

Connect it to an other host via 10gbit ethernet. (We tried with ESXi
5.1 and 5.5.)

Start to ping the other end and leave it for hours.

Engage some high traffic (utilise the connection over 5Gbit/sec),
probably via NFS to an ESXi 5.5 host on the other side.

Wait some hours.
>Fix:


>Release-Note:
>Audit-Trail:

From: "Pataki Antal (Granaglia Kft.)" <pataki.antal@granaglia.com>
To: bug-followup@FreeBSD.org,
 Pataki Antal <pataki.antal@gmail.com>
Cc:  
Subject: Re: misc/183390: 10gigabit networking problems
Date: Wed, 30 Oct 2013 22:03:30 +0100

 why is this non-critical?
 the other side drops the connection because of this, this is very =
 critical for example if the bogous system is a storage...=
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Oct 31 02:43:11 UTC 2013 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=183390 
Responsible-Changed-From-To: freebsd-net->jfv 
Responsible-Changed-By: delphij 
Responsible-Changed-When: Mon Mar 17 22:41:15 UTC 2014 
Responsible-Changed-Why:  
Hi, Jack, 

Some FreeNAS users [1] have encountered similar issue too, can you take 
a look at this one? 

Thanks in advance! 

[1] https://bugs.freenas.org/issues/4560 

http://www.freebsd.org/cgi/query-pr.cgi?pr=183390 

From: Christopher Forgeron <csforgeron@gmail.com>
To: bug-followup@FreeBSD.org, pataki.antal@gmail.com
Cc:  
Subject: Re: kern/183390: [ixgbe] 10gigabit networking problems
Date: Fri, 21 Mar 2014 00:08:56 -0300

 --001a11c136ced297aa04f515344b
 Content-Type: text/plain; charset=ISO-8859-1
 
 To keep you in the loop;
 
 I'm having a very similar problem in 10.0-RELEASE
 
 We've made some headway - Disabling TSO (ifconfig ix0 -tso) seems to avoid
 the symptom, but of course that's just a temporary fix.
 
 Try it, and see if you have stability again.
 
 
 The discussion is the freebsd-net mailing list, at
 http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html
 
 It's a bit long, but follow along as it may help your situation. I hope to
 test changes to the TSO code tomorrow.
 
 --001a11c136ced297aa04f515344b
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 <div dir=3D"ltr"><div>To keep you in the loop;<br><br>I&#39;m having a very=
  similar problem in 10.0-RELEASE<br><br>We&#39;ve made some headway - Disab=
 ling TSO (ifconfig ix0 -tso) seems to avoid the symptom, but of course that=
 &#39;s just a temporary fix.<br>
 <br>Try it, and see if you have stability again. <br><br><br></div><div>The=
  discussion is the freebsd-net mailing list, at <a href=3D"http://lists.fre=
 ebsd.org/pipermail/freebsd-net/2014-March/038061.html">http://lists.freebsd=
 .org/pipermail/freebsd-net/2014-March/038061.html</a><br>
 <br></div><div>It&#39;s a bit long, but follow along as it may help your si=
 tuation. I hope to test changes to the TSO code tomorrow. <br></div><div><b=
 r></div></div>
 
 --001a11c136ced297aa04f515344b--

From: John Hickey <jjh@deterlab.net>
To: bug-followup@FreeBSD.org, pataki.antal@gmail.com
Cc:  
Subject: Re: kern/183390: [ixgbe] 10gigabit networking problems
Date: Sun, 27 Apr 2014 22:58:40 -0700

 I am seeing this too on 10.0-RELEASE.  Disabling TSO doesn't seem to 
 help it either.  The server was undergoing fairly heavy load related to 
 ZFS at the time .  Network was fairly quiet since the NFS connections I 
 did have ended up hanging.
 
 System specs:
 
 FreeBSD 10.0-RELEASE-p1 #3 r264309: Wed Apr  9 17:01:09 PDT 2014
 2x Opteron 6128 (16 total cores)
 128GB RAM
 Intel X520 NIC
 ~22TB ZFS filesystem
>Unformatted:
