From nobody@FreeBSD.org  Tue Feb 26 19:13:38 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
	by hub.freebsd.org (Postfix) with ESMTP id E849711C
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 26 Feb 2013 19:13:38 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id C8D9A19E9
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 26 Feb 2013 19:13:38 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id r1QJDcM0099752
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 26 Feb 2013 19:13:38 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id r1QJDcLF099751;
	Tue, 26 Feb 2013 19:13:38 GMT
	(envelope-from nobody)
Message-Id: <201302261913.r1QJDcLF099751@red.freebsd.org>
Date: Tue, 26 Feb 2013 19:13:38 GMT
From: Julien Charbon <jcharbon@verisign.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Concurrency in ixgbe driving out-of-order packet process and spurious RST
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         176446
>Category:       kern
>Synopsis:       [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    jfv
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Feb 26 19:20:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Wed Jan 29 18:20:00 UTC 2014
>Originator:     Julien Charbon
>Release:        FreeBSD 8.3
>Organization:
Verisign
>Environment:
FreeBSD atlas-dl360-3 8.3-RELEASE FreeBSD 8.3-RELEASE #14 r50M: Tue Feb 26 15:13:53 UTC 2013     jcharbon@atlas-dl360-3:/usr/obj/usr/src/jcharbon/svn/freebsd-src-8.3.4/sys/GENERIC  amd64
>Description:
Under TCP network load using the ixgbe driver, we found an unexpected TCP behaviour:

15:26:45.129164 IP 192.168.100.23.30222 > 192.168.100.144.8080: Flags [S], seq 2020028671, win 14600, options [mss 1460,sackOK,TS val 1622018533 ecr 0,nop,wscale 7], length 0
15:26:45.130844 IP 192.168.100.144.8080 > 192.168.100.23.30222: Flags [S.], seq 1114608110, ack 2020028672, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2170246269 ecr 1622018533], length 0
15:26:47.007276 IP 192.168.100.23.30222 > 192.168.100.144.8080: Flags [.], ack 1, win 115, options [nop,nop,TS val 1622020597 ecr 2170246269], length 0
15:26:47.007283 IP 192.168.100.23.30222 > 192.168.100.144.8080: Flags [P.], ack 1, win 115, options [nop,nop,TS val 1622020597 ecr 2170246269], length 4
15:26:47.013799 IP 192.168.100.144.8080 > 192.168.100.23.30222: Flags [R], seq 1114608111, win 0, length 0
15:26:47.019366 IP 192.168.100.144.8080 > 192.168.100.23.30222: Flags [P.], ack 5, win 1040, options [nop,nop,TS val 2170248157 ecr 1622020597], length 128
15:26:47.020353 IP 192.168.100.144.8080 > 192.168.100.23.30222: Flags [F.], seq 129, ack 5, win 1040, options [nop,nop,TS val 2170248158 ecr 1622020597], length 0
15:26:48.565166 IP 192.168.100.23.30222 > 192.168.100.144.8080: Flags [R], seq 2020028676, win 0, length 0

This TCP request was both RST'ed _and_ replied. Using net.inet.tcp.log_debug=1, we saw in debug log:

Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080; syncache_socket: in_pcbconnect failed with error 48
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored

By adding more debug log in kernel (see joined patch), the origin of this unexpected behaviour seems to be in ixgbe driver:

Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x18<PUSH,ACK>; syncache_expand: SEQ 2020028672, ACK 1114608111, syncache_socket() succeed
Feb 26 15:26:47 atlas-dl360-3 kernel: #0 0xffffffff80796b86 at syncache_expand+0x546
Feb 26 15:26:47 atlas-dl360-3 kernel: #1 0xffffffff8078e36b at tcp_input+0xf4b
Feb 26 15:26:47 atlas-dl360-3 kernel: #2 0xffffffff8071feec at ip_input+0xac
Feb 26 15:26:47 atlas-dl360-3 kernel: #3 0xffffffff806cb11e at netisr_dispatch_src+0x7e
Feb 26 15:26:47 atlas-dl360-3 kernel: #4 0xffffffff806c11bd at ether_demux+0x14d
Feb 26 15:26:47 atlas-dl360-3 kernel: #5 0xffffffff806c15c7 at ether_input+0x197
Feb 26 15:26:47 atlas-dl360-3 kernel: #6 0xffffffff803e691b at ixgbe_rxeof+0x1eb
Feb 26 15:26:47 atlas-dl360-3 kernel: #7 0xffffffff803e7108 at ixgbe_msix_que+0xa8
Feb 26 15:26:47 atlas-dl360-3 kernel: #8 0xffffffff805e7674 at intr_event_execute_handlers+0x104
Feb 26 15:26:47 atlas-dl360-3 kernel: #9 0xffffffff805e8d05 at ithread_loop+0x95
Feb 26 15:26:47 atlas-dl360-3 kernel: #10 0xffffffff805e488f at fork_exit+0x11f
Feb 26 15:26:47 atlas-dl360-3 kernel: #11 0xffffffff808edc7e at fork_trampoline+0xe
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080; syncache_socket: in_pcbconnect failed with error 48
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x10<ACK>; syncache_expand: SEQ 2020028672, ACK 1114608111, syncache_socket() failed
Feb 26 15:26:47 atlas-dl360-3 kernel: #0 0xffffffff80796b86 at syncache_expand+0x546
Feb 26 15:26:47 atlas-dl360-3 kernel: #1 0xffffffff8078e36b at tcp_input+0xf4b
Feb 26 15:26:47 atlas-dl360-3 kernel: #2 0xffffffff8071feec at ip_input+0xac
Feb 26 15:26:47 atlas-dl360-3 kernel: #3 0xffffffff806cb11e at netisr_dispatch_src+0x7e
Feb 26 15:26:47 atlas-dl360-3 kernel: #4 0xffffffff806c11bd at ether_demux+0x14d
Feb 26 15:26:47 atlas-dl360-3 kernel: #5 0xffffffff806c15c7 at ether_input+0x197
Feb 26 15:26:47 atlas-dl360-3 kernel: #6 0xffffffff803e691b at ixgbe_rxeof+0x1eb
Feb 26 15:26:47 atlas-dl360-3 kernel: #7 0xffffffff803e7c31 at ixgbe_handle_que+0xd1
Feb 26 15:26:47 atlas-dl360-3 kernel: #8 0xffffffff8064e2c5 at taskqueue_run_locked+0x85
Feb 26 15:26:47 atlas-dl360-3 kernel: #9 0xffffffff8064e45e at taskqueue_thread_loop+0x4e
Feb 26 15:26:47 atlas-dl360-3 kernel: #10 0xffffffff805e488f at fork_exit+0x11f
Feb 26 15:26:47 atlas-dl360-3 kernel: #11 0xffffffff808edc7e at fork_trampoline+0xe
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST
Feb 26 15:26:47 atlas-dl360-3 kernel: TCP: [192.168.100.23]:30222 to [192.168.100.144]:8080 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored

Here we saw the concurrency between ixgbe_handle_que() and ixgbe_msix_que() both calling ixgbe_rxeof() at same time, and the client <PUSH,ACK> packet being proceed before the client <ACK> packet.

If we are currently testing FreeBSD 8.3, we saw nothing that would prevent this issue in 9.1 and in CURRENT.
>How-To-Repeat:
Load a TCP service enough to see the "syncache_socket: in_pcbconnect failed with error 48" error.
>Fix:


Patch attached with submission follows:

Index: sys/netinet/tcp_syncache.c
===================================================================
--- sys/netinet/tcp_syncache.c	(revision 50)
+++ sys/netinet/tcp_syncache.c	(working copy)
@@ -97,6 +97,11 @@
 
 #include <security/mac/mac_framework.h>
 
+#include <sys/param.h>
+#include <sys/stack.h>
+#include <sys/types.h>
+#include <sys/sbuf.h>
+
 static VNET_DEFINE(int, tcp_syncookies) = 1;
 #define	V_tcp_syncookies		VNET(tcp_syncookies)
 SYSCTL_VNET_INT(_net_inet_tcp, OID_AUTO, syncookies, CTLFLAG_RW,
@@ -200,6 +205,11 @@
 #define	SCH_UNLOCK(sch)		mtx_unlock(&(sch)->sch_mtx)
 #define	SCH_LOCK_ASSERT(sch)	mtx_assert(&(sch)->sch_mtx, MA_OWNED)
 
+struct stack *st;
+struct mtx st_mtx;
+struct sbuf st_sb;
+char st_trace[65536];
+
 /*
  * Requires the syncache entry to be already removed from the bucket list.
  */
@@ -223,6 +233,10 @@
 {
 	int i;
 
+	st = stack_create();
+	mtx_init(&st_mtx, "tcp_sc_stack", NULL, MTX_DEF);
+	sbuf_new(&st_sb, st_trace, sizeof(st_trace), SBUF_FIXEDLEN);
+
 	V_tcp_syncache.cache_count = 0;
 	V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
 	V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT;
@@ -277,6 +291,10 @@
 	struct syncache *sc, *nsc;
 	int i;
 
+	sbuf_delete(&st_sb);
+	mtx_destroy(&st_mtx);
+	stack_destroy(st);
+
 	/* Cleanup hash buckets: stop timers, free entries, destroy locks. */
 	for (i = 0; i < V_tcp_syncache.hashsize; i++) {
 
@@ -949,11 +967,27 @@
 
 	*lsop = syncache_socket(sc, *lsop, m);
 
-	if (*lsop == NULL)
+	if (*lsop == NULL) {
+		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
+			log(LOG_DEBUG, "%s; %s: SEQ %u, ACK %u, syncache_socket() "
+			    "failed\n", s, __func__, th->th_seq, th->th_ack);
 		TCPSTAT_INC(tcps_sc_aborted);
-	else
+        }
+	else {
+		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
+			log(LOG_DEBUG, "%s; %s: SEQ %u, ACK %u, syncache_socket() "
+			    "succeed\n", s, __func__, th->th_seq, th->th_ack);
 		TCPSTAT_INC(tcps_sc_completed);
+	}
 
+	mtx_lock(&st_mtx);
+	stack_save(st);
+	stack_sbuf_print(&st_sb, st);
+	sbuf_finish(&st_sb);
+	log(LOG_DEBUG, "%s", sbuf_data(&st_sb));
+	sbuf_clear(&st_sb);
+	mtx_unlock(&st_mtx);
+
 /* how do we find the inp for the new socket? */
 	if (sc != &scs)
 		syncache_free(sc);


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Feb 27 00:09:22 UTC 2013 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=176446 

From: "Charbon, Julien" <jcharbon@verisign.com>
To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
Cc: jfv@FreeBSD.org, "De La Gueronniere, Marc" <mdelagueronniere@verisign.com>
Subject: Re: kern/176446: Concurrency in ixgbe driving out-of-order packet
 process and spurious RST
Date: Wed, 27 Feb 2013 13:54:20 +0100

   I successfully reproduced this issue using only the socat tool:
 
 Server side:  Enable TCP debug, and launch socat server:
 
 # sysctl net.inet.tcp.log_debug=1
 net.inet.tcp.log_debug: 0 -> 1
 $ cat /some/where/response.sh
 #!/usr/bin/env bash
 read line
 echo -n 
 "01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567"
 $ socat TCP4-LISTEN:8181,fork,reuseaddr EXEC:/some/where/response.sh
 
 Client side:  Create socat client TCP load:
 
 $ cat ./request
 128
 $ while true; do socat ./request,ignoreeof\!\!./response 
 TCP4:192.168.100.144:8181 & sleep 0.0001; done
 
 Then on server side just wait for a TCP debug in /var/log/debug.log like:
 
 kernel: TCP: [192.168.100.136]:48359 to [192.168.100.144]:8181; 
 syncache_socket: in_pcbconnect failed with error 48
 kernel: TCP: [192.168.100.136]:48359 to [192.168.100.144]:8181 tcpflags 
 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to 
 limits or memory shortage, sending RST
 kernel: TCP: [192.168.100.136]:48359 to [192.168.100.144]:8181 tcpflags 
 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry 
 (possibly syncookie only), segment ignored
 
   Adding Jack F. Vogel (ixgbe driver maintainer), and Marc De La 
 Gueronniere (author of our work in progress patch for this issue).
 
 --
 Julien

From: John Baldwin <jhb@freebsd.org>
To: bug-followup@freebsd.org,
 jcharbon@verisign.com
Cc:  
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
Date: Thu, 28 Feb 2013 10:57:24 -0500

 Can you try the fixes from http://svnweb.freebsd.org/base?view=revision&revision=240968?
 
 -- 
 John Baldwin

From: "Charbon, Julien" <jcharbon@verisign.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org,
        "De La Gueronniere, Marc" <mdelagueronniere@verisign.com>,
        jfv@freebsd.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order
 packet process and spurious RST
Date: Thu, 28 Feb 2013 20:10:39 +0100

 On 2/28/13 4:57 PM, John Baldwin wrote:
 > Can you try the fixes from http://svnweb.freebsd.org/base?view=revision&revision=240968?
 
   Actually, Marc (I CC'ed him) did find the r240968 fix for concurrency 
 between ixgbe_msix_que() and ixgbe_handle_que(), and made a backport for 
 release-8.3.0 (see patch [1] below).  However, the issue was still 
 reproducible, then Marc found another place for concurrency from 
 ixgbe_local_timer() and fix it (see patch [2]).  But it was still not 
 enough, and he found a last place for concurrency due to 
 ixgbe_rearm_queues() call (see patch [3]).  We all these patches 
 applied, we were not able to reproduce this issue.
 
   If patch [1] and [2] seems clearly legitimates, patch [3] would need 
 more discussions/feedback I guess.
 
 --
 Julien
 
 [1] Patch ixgbe (1/3): Backport r240968 in release-8.3.0
 
 Index: sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- sys/dev/ixgbe/ixgbe.c
 +++ sys/dev/ixgbe/ixgbe.c
 @@ -102,13 +102,15 @@
   static int      ixgbe_attach(device_t);
   static int      ixgbe_detach(device_t);
   static int      ixgbe_shutdown(device_t);
 -static void     ixgbe_start(struct ifnet *);
 -static void     ixgbe_start_locked(struct tx_ring *, struct ifnet *);
   #if __FreeBSD_version >= 800000
   static int	ixgbe_mq_start(struct ifnet *, struct mbuf *);
   static int	ixgbe_mq_start_locked(struct ifnet *,
                       struct tx_ring *, struct mbuf *);
   static void	ixgbe_qflush(struct ifnet *);
 +static void	ixgbe_deferred_mq_start(void *, int);
 +#else
 +static void     ixgbe_start(struct ifnet *);
 +static void     ixgbe_start_locked(struct tx_ring *, struct ifnet *);
   #endif
   static int      ixgbe_ioctl(struct ifnet *, u_long, caddr_t);
   static void	ixgbe_init(void *);
 @@ -645,6 +647,7 @@
   {
   	struct adapter *adapter = device_get_softc(dev);
   	struct ix_queue *que = adapter->queues;
 +	struct tx_ring *txr = adapter->tx_rings;
   	u32	ctrl_ext;
 
   	INIT_DEBUGOUT("ixgbe_detach: begin");
 @@ -659,8 +662,11 @@
   	ixgbe_stop(adapter);
   	IXGBE_CORE_UNLOCK(adapter);
 
 -	for (int i = 0; i < adapter->num_tx_queues; i++, que++) {
 +	for (int i = 0; i < adapter->num_tx_queues; i++, que++, txr++) {
   		if (que->tq) {
 +#if __FreeBSD_version >= 800000
 +			taskqueue_drain(que->tq, &txr->txq_task);
 +#endif
   			taskqueue_drain(que->tq, &que->que_task);
   			taskqueue_free(que->tq);
   		}
 @@ -722,6 +728,7 @@
   }
 
 
 +#if __FreeBSD_version < 800000
   /*********************************************************************
    *  Transmit entry point
    *
 @@ -793,7 +800,7 @@
   	return;
   }
 
 -#if __FreeBSD_version >= 800000
 +#else
   /*
   ** Multiqueue Transmit driver
   **
 @@ -821,7 +828,7 @@
   		IXGBE_TX_UNLOCK(txr);
   	} else {
   		err = drbr_enqueue(ifp, txr->br, m);
 -		taskqueue_enqueue(que->tq, &que->que_task);
 +		taskqueue_enqueue(que->tq, &txr->txq_task);
   	}
 
   	return (err);
 @@ -887,6 +894,22 @@
   }
 
   /*
 + * Called from a taskqueue to drain queued transmit packets.
 + */
 +static void
 +ixgbe_deferred_mq_start(void *arg, int pending)
 +{
 +	struct tx_ring *txr = arg;
 +	struct adapter *adapter = txr->adapter;
 +	struct ifnet *ifp = adapter->ifp;
 +
 +	IXGBE_TX_LOCK(txr);
 +	if (!drbr_empty(ifp, txr->br))
 +		ixgbe_mq_start_locked(ifp, txr, NULL);
 +	IXGBE_TX_UNLOCK(txr);
 +}
 +
 +/*
   ** Flush all ring buffers
   */
   static void
 @@ -2210,6 +2233,9 @@
   {
   	device_t dev = adapter->dev;
   	struct		ix_queue *que = adapter->queues;
 +#if __FreeBSD_version >= 800000
 +	struct tx_ring		*txr = adapter->tx_rings;
 +#endif
   	int error, rid = 0;
 
   	/* MSI RID at 1 */
 @@ -2229,6 +2255,9 @@
   	 * Try allocating a fast interrupt and the associated deferred
   	 * processing contexts.
   	 */
 +#if __FreeBSD_version >= 800000
 +	TASK_INIT(&txr->txq_task, 0, ixgbe_deferred_mq_start, txr);
 +#endif
   	TASK_INIT(&que->que_task, 0, ixgbe_handle_que, que);
   	que->tq = taskqueue_create_fast("ixgbe_que", M_NOWAIT,
               taskqueue_thread_enqueue, &que->tq);
 @@ -2275,9 +2304,10 @@
   {
   	device_t        dev = adapter->dev;
   	struct 		ix_queue *que = adapter->queues;
 +	struct  	tx_ring *txr = adapter->tx_rings;
   	int 		error, rid, vector = 0;
 
 -	for (int i = 0; i < adapter->num_tx_queues; i++, vector++, que++) {
 +	for (int i = 0; i < adapter->num_tx_queues; i++, vector++, que++, txr++) {
   		rid = vector + 1;
   		que->res = bus_alloc_resource_any(dev, SYS_RES_IRQ, &rid,
   		    RF_SHAREABLE | RF_ACTIVE);
 @@ -2307,6 +2337,9 @@
   		if (adapter->num_tx_queues > 1)
   			bus_bind_intr(dev, que->res, i);
 
 +#if __FreeBSD_version >= 800000
 +		TASK_INIT(&txr->txq_task, 0, ixgbe_deferred_mq_start, txr);
 +#endif
   		TASK_INIT(&que->que_task, 0, ixgbe_handle_que, que);
   		que->tq = taskqueue_create_fast("ixgbe_que", M_NOWAIT,
   		    taskqueue_thread_enqueue, &que->tq);
 @@ -2555,12 +2588,13 @@
   	ifp->if_softc = adapter;
   	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
   	ifp->if_ioctl = ixgbe_ioctl;
 -	ifp->if_start = ixgbe_start;
   #if __FreeBSD_version >= 800000
   	ifp->if_transmit = ixgbe_mq_start;
   	ifp->if_qflush = ixgbe_qflush;
 +#else
 +	ifp->if_start = ixgbe_start;
 +	IFQ_SET_MAXLEN(&ifp->if_snd, adapter->num_tx_desc - 2);
   #endif
 -	ifp->if_snd.ifq_maxlen = adapter->num_tx_desc - 2;
 
   	ether_ifattach(ifp, adapter->hw.mac.addr);
 
 Index: sys/dev/ixgbe/ixgbe.h
 ===================================================================
 --- sys/dev/ixgbe/ixgbe.h
 +++ sys/dev/ixgbe/ixgbe.h
 @@ -298,6 +298,7 @@
   	char			mtx_name[16];
   #if __FreeBSD_version >= 800000
   	struct buf_ring		*br;
 +	struct task		txq_task;
   #endif
   #ifdef IXGBE_FDIR
   	u16			atr_sample;
 
 [2] Patch ixgbe (2/3): Do not schedule ixgbe_handle_que() from 
 ixgbe_local_timer().
 
 Index: sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- sys/dev/ixgbe/ixgbe.c
 +++ sys/dev/ixgbe/ixgbe.c
 @@ -2033,7 +2033,7 @@
   		if (txr->queue_status & IXGBE_QUEUE_DEPLETED)
   			++busy;
   		if ((txr->queue_status & IXGBE_QUEUE_IDLE) == 0)
 -			taskqueue_enqueue(que->tq, &que->que_task);
 +			taskqueue_enqueue(que->tq, &txr->txq_task);
           }
   	/* Only truely watchdog if all queues show hung */
           if (hung == adapter->num_tx_queues)
 
 [3] Patch ixgbe (3/3): ixgbe_rearm_queues() directly schedules an 
 interruption and drives not wanted concurrency, should we called it at all?
 
 Index: sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- sys/dev/ixgbe/ixgbe.c
 +++ sys/dev/ixgbe/ixgbe.c
 @@ -2046,7 +2046,7 @@
                   ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 
   out:
 -       ixgbe_rearm_queues(adapter, adapter->que_mask);
 +       // ixgbe_rearm_queues(adapter, adapter->que_mask);
          callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
          return;
 
 @@ -4674,7 +4674,7 @@
          ** Schedule another interrupt if so.
          */
          if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
 -               ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
 +               // ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
                  return (TRUE);
          }
 

From: "Charbon, Julien" <jcharbon@verisign.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org,
        "De La Gueronniere, Marc" <mdelagueronniere@verisign.com>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order
 packet process and spurious RST
Date: Thu, 07 Mar 2013 11:11:25 +0100

 On 2/28/13 8:10 PM, Charbon, Julien wrote:
 > On 2/28/13 4:57 PM, John Baldwin wrote:
 >> Can you try the fixes from http://svnweb.freebsd.org/base?view=revision&revision=240968?
 >
 >    Actually, Marc (I CC'ed him) did find the r240968 fix for concurrency
 > between ixgbe_msix_que() and ixgbe_handle_que(), and made a backport for
 > release-8.3.0 (see patch [1] below).  However, the issue was still
 > reproducible, then Marc found another place for concurrency from
 > ixgbe_local_timer() and fix it (see patch [2]).  But it was still not
 > enough, and he found a last place for concurrency due to
 > ixgbe_rearm_queues() call (see patch [3]).  We all these patches
 > applied, we were not able to reproduce this issue.
 
   Just for the record:  As expected this issue is reproducible on 
 9.1-RELEASE:
 
 # uname -a
 FreeBSD atlas 9.1-RELEASE FreeBSD 9.1-RELEASE #1 r247851M: Wed Mar  6 
 11:17:43 UTC 2013 
 jcharbon@atlas:/usr/obj/app/jcharbon/9.1.0/sys/GENERIC  amd64
 
   Enable TCP debug log:
 
 # sysctl net.inet.tcp.log_debug=1
 
   Load enough a TCP service and due to ixgbe race conditions between 
 ixgbe_msix_que() and ixgbe_handle_que(), you will get:
 
 Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 [192.168.100.152]:8080; syncache_socket: in_pcbconnect failed with error 48
 Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 [192.168.100.152]:8080 tcpflags 0x10<ACK>; tcp_input: Listen socket: 
 Socket allocation failed due to limits or memory shortage, sending RST
 Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 [192.168.100.152]:8080 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST 
 without matching syncache entry (possibly syncookie only), segment ignored
 
   We will provide our current fix patch for 9.1-RELEASE.
 
 --
 Julien

From: "Charbon, Julien" <jcharbon@verisign.com>
To: Cc: bug-followup@freebsd.org,
        "De La Gueronniere, Marc" <mdelagueronniere@verisign.com>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order
 packet process and spurious RST
Date: Tue, 12 Mar 2013 14:59:11 +0100

 This is a multi-part message in MIME format.
 --------------050609010000050001050509
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 On 3/7/13 11:11 AM, Charbon, Julien wrote:
 > On 2/28/13 8:10 PM, Charbon, Julien wrote:
 >> On 2/28/13 4:57 PM, John Baldwin wrote:
 >>> Can you try the fixes from http://svnweb.freebsd.org/base?view=revision&revision=240968?
 >>
 >>     Actually, Marc (I CC'ed him) did find the r240968 fix for concurrency
 >> between ixgbe_msix_que() and ixgbe_handle_que(), and made a backport for
 >> release-8.3.0 (see patch [1] below).  However, the issue was still
 >> reproducible, then Marc found another place for concurrency from
 >> ixgbe_local_timer() and fix it (see patch [2]).  But it was still not
 >> enough, and he found a last place for concurrency due to
 >> ixgbe_rearm_queues() call (see patch [3]).  We all these patches
 >> applied, we were not able to reproduce this issue.
 >
 >    Just for the record:  As expected this issue is reproducible on
 > 9.1-RELEASE:
 
      Just for the record:  As expected this issue is reproducible also on
 10.0-CURRENT:
 
 # uname -a
 FreeBSD atlas 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r248173M: Tue Mar 12 
 07:52:58 UTC 2013 
 jcharbon@atlas:/usr/obj/app/jcharbon/head/sys/GENERIC  amd64
 
   1. Enable TCP debug log:
 
 # sysctl net.inet.tcp.log_debug=1
 net.inet.tcp.log_debug: 1 -> 1
 
   2. Load a TCP service with numerous small requests/responses:
 
   3. Look in /var/log/debug.log for the pattern:
 
 Mar 12 10:31:22 atlas kernel: TCP: [192.168.100.35]:4698 to 
 [192.168.100.152]:8080; syncache_socket: in_pcbconnect failed with error 48
 Mar 12 10:31:22 atlas-dl360-4 kernel: TCP: [192.168.100.35]:4698 to 
 [192.168.100.152]:8080 tcpflags 0x10<ACK>; tcp_input: Listen socket: 
 Socket allocation failed due to limits or memory shortage, sending RST
 Mar 12 10:31:22 atlas-dl360-4 kernel: TCP: [192.168.100.35]:4698 to 
 [192.168.100.152]:8080 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST 
 without matching syncache entry (possibly syncookie only), segment ignored
 
   Joined the patch we use to fix this issue in 10-CURRENT.
 
 --
 Julien
 
 --------------050609010000050001050509
 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
  name="ixgbe.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="ixgbe.c.patch"
 
 Index: sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- sys/dev/ixgbe/ixgbe.c	(revision 248173)
 +++ sys/dev/ixgbe/ixgbe.c	(working copy)
 @@ -2038,14 +2038,14 @@
  		    (paused == 0))
  			++hung;
  		else if (txr->queue_status == IXGBE_QUEUE_WORKING)
 -			taskqueue_enqueue(que->tq, &que->que_task);
 +			taskqueue_enqueue(que->tq, &txr->txq_task);
          }
  	/* Only truely watchdog if all queues show hung */
          if (hung == adapter->num_queues)
                  goto watchdog;
  
  out:
 -	ixgbe_rearm_queues(adapter, adapter->que_mask);
 +	// ixgbe_rearm_queues(adapter, adapter->que_mask);
  	callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
  	return;
  
 @@ -4575,7 +4575,7 @@
  	** Schedule another interrupt if so.
  	*/
  	if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
 -		ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
 +		// ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
  		return (TRUE);
  	}
  
 
 --------------050609010000050001050509--

From: John Baldwin <jhb@freebsd.org>
To: "Charbon, Julien" <jcharbon@verisign.com>
Cc: bug-followup@freebsd.org,
 "De La Gueronniere, Marc" <mdelagueronniere@verisign.com>,
 jfv@freebsd.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
Date: Thu, 14 Mar 2013 09:34:18 -0400

 On Thursday, March 07, 2013 5:11:25 am Charbon, Julien wrote:
 > On 2/28/13 8:10 PM, Charbon, Julien wrote:
 > > On 2/28/13 4:57 PM, John Baldwin wrote:
 > >> Can you try the fixes from 
 http://svnweb.freebsd.org/base?view=revision&revision=240968?
 > >
 > >    Actually, Marc (I CC'ed him) did find the r240968 fix for concurrency
 > > between ixgbe_msix_que() and ixgbe_handle_que(), and made a backport for
 > > release-8.3.0 (see patch [1] below).  However, the issue was still
 > > reproducible, then Marc found another place for concurrency from
 > > ixgbe_local_timer() and fix it (see patch [2]).  But it was still not
 > > enough, and he found a last place for concurrency due to
 > > ixgbe_rearm_queues() call (see patch [3]).  We all these patches
 > > applied, we were not able to reproduce this issue.
 > 
 >   Just for the record:  As expected this issue is reproducible on 
 > 9.1-RELEASE:
 > 
 > # uname -a
 > FreeBSD atlas 9.1-RELEASE FreeBSD 9.1-RELEASE #1 r247851M: Wed Mar  6 
 > 11:17:43 UTC 2013 
 > jcharbon@atlas:/usr/obj/app/jcharbon/9.1.0/sys/GENERIC  amd64
 > 
 >   Enable TCP debug log:
 > 
 > # sysctl net.inet.tcp.log_debug=1
 > 
 >   Load enough a TCP service and due to ixgbe race conditions between 
 > ixgbe_msix_que() and ixgbe_handle_que(), you will get:
 > 
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080; syncache_socket: in_pcbconnect failed with error 48
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080 tcpflags 0x10<ACK>; tcp_input: Listen socket: 
 > Socket allocation failed due to limits or memory shortage, sending RST
 > Mar  7 10:01:04 atlas kernel: TCP: [192.168.100.21]:12918 to 
 > [192.168.100.152]:8080 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST 
 > without matching syncache entry (possibly syncookie only), segment ignored
 > 
 >   We will provide our current fix patch for 9.1-RELEASE.
 
 The place you noticed in 2) is broken, though your fix isn't quite correct.  
 I've been hesitant to reply yet as it requires a long reply.  The short 
 version is that the task to handle rx/tx processing should never be queued by 
 anything other than an interrupt handler or itself (when it reschedules 
 itself).  Anything else that schedules it is going to result in lock 
 contention and out-of-order packet delivery.
 
 Your 3rd case is also correct.  We should not re-enable interrupts on every 
 timer tick since the rx/tx task might already be running.  Similarly, re-
 enabling all queues anytime one queue processes RX interrupts can trigger an 
 interrupt on another queue while it's rx/tx task is already running.  Both of 
 these are pointless as each queue will rearm itself when the rx/tx task finds 
 no more pending RX packets to process.
 
 Now, some more details on the 2nd one which is due to watchdog handling which 
 is broken in both igb and ixgbe.  First, some background on how watchdog 
 handling works in nearly all other drivers (and specifically in single-queue 
 drivers):
 
 First, each device maintains a 'timer' field in the softc which is a count of 
 seconds until the transmit watchdog should expire.  Whenever a packet is 
 queued for transmit in the descriptor ring, it is set to the 'N' seconds (e.g. 
 5).  Whenever the transmit completion interrupt fully drains the descriptor 
 ring such that the ring is idle the timer is set to 0.
 
 Second, each device runs a periodic stats timer that fires once a second while 
 the interface is "up" (so it is started in the foo_init() routine and stopped 
 in foo_stop()).  Part of this timer's job is to check the transmit watchdog.  
 It uses logic like this to do so:
 
    if (timer > 0) {
        timer--;
        if (timer == 0) {
            /* watchdog expired */
        }
    }
 
 The typical implementation for the watchdog expiring is to just reset the chip 
 by doing 'foo_stop()' followed by 'foo_init_locked()'.  However, if you have a 
 NIC whose hardware is known to have a quirk where it can lose interrupts, then 
 a driver can decide to scan the TX ring to see if it makes any progress.  It 
 should do this synchronously from the timer, not by scheduling another task.  
 Also, if you do make progress, then you should reset the watchdog timer if 
 there are still any pending transmits.  In this case I would suggest only 
 setting it to '1' so you check it on the next second.  The psuedo-code for 
 this would look something like:
 
     if (timer > 0) {
         timer--;
         if (timer == 0) {
             /* Have this return true if it finds any completions */
             if (foo_txintr()) {
                 if (tx ring is not empty)
                     timer = 1;
                 return;
             }
             foo_stop();
             foo_init_locked();
     }
 
 However, most drivers don't need that sort of complication at all as working 
 hardware shouldn't be regularly failing to schedule interrupts.
 
 The one wrinkle that multiple queues throw into this is that it is hard to 
 know when your transmit interrupt routine can clear 'timer' to 0 to disable it 
 as you should only do so if all transmit queues are empty.  One easy solution 
 is to simply make the 'timer' field per transmit queue and have your stats 
 timer check each queue.  The remaining question then is when you do actually
 reset the chip.  I think you should do it as soon as one queue becomes "hung" 
 vs waiting for all of the queues to be hung.
 
 The issues I see with the igb/ixgbe driver's watchdog handling:
 
 1) Rather than managing 'timer' as described above, they store the current
    walltime when a packet is transmitted and then try to decide in the stats
    timer if too much time has gone by.  This makes things more complicated as
    you have to deal with 'ticks' rolling over, etc.  Also, igb tries to
    determine this in the tx interrupt handler rather than doing it directly
    in the stats timer.
 
 2) At least igb(4) tries to set IFF_DRV_OACTIVE when it thinks the chip is in
    the hung state.  This is not what OACTIVE means and shouldn't be there at
    all.  OACTIVE should be set in the if_start() case if the transmit ring is
    full and should be cleared either when the chip is reset or if a transmit
    completion interrupt frees up TX descriptors.
 
 3) The watchdog handler queues a RX/TX task on every stats timer tick if
    there are any pending TX frames.  a) It should only do this if the
    queue is hung, and b) it should only do TX processing, and c) it
    doesn't need to schedule a task.
 
 4) At least igb(4) attempts to maintain a separate transmit queue status.
    I find that this actually makes things more complex and harder to
    understand and that is simpler to check the relevant flag instead.
 
 My patch for igb(4) to fix this watchdog handling is below.  The fixes to 
 ixgbe are probably similar since these drivers share many algorithms:
 
 Index: if_igb.c
 ===================================================================
 --- if_igb.c	(revision 248162)
 +++ if_igb.c	(working copy)
 @@ -856,7 +856,8 @@ igb_resume(device_t dev)
  			    !drbr_empty(ifp, txr->br))
  				igb_mq_start_locked(ifp, txr);
  #else
 -			if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +			if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 +			    !IFQ_DRV_IS_EMPTY(&ifp->if_snd))
  				igb_start_locked(txr, ifp);
  #endif
  			IGB_TX_UNLOCK(txr);
 @@ -913,8 +914,10 @@ igb_start_locked(struct tx_ring *txr, struct ifnet
  		if (igb_xmit(txr, &m_head)) {
  			if (m_head != NULL)
  				IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 -			if (txr->tx_avail <= IGB_MAX_SCATTER)
 +			if (txr->tx_avail <= IGB_MAX_SCATTER) {
  				txr->queue_status |= IGB_QUEUE_DEPLETED;
 +				ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 +			}
  			break;
  		}
  
 @@ -922,7 +925,7 @@ igb_start_locked(struct tx_ring *txr, struct ifnet
  		ETHER_BPF_MTAP(ifp, m_head);
  
  		/* Set watchdog on */
 -		txr->watchdog_time = ticks;
 +		txr->watchdog_time = IGB_WATCHDOG;
  		txr->queue_status |= IGB_QUEUE_WORKING;
  	}
  }
 @@ -1018,7 +1021,7 @@ igb_mq_start_locked(struct ifnet *ifp, struct tx_r
  	if (enq > 0) {
  		/* Set the watchdog */
  		txr->queue_status |= IGB_QUEUE_WORKING;
 -		txr->watchdog_time = ticks;
 +		txr->watchdog_time = IGB_WATCHDOG;
  	}
  	if (txr->tx_avail <= IGB_TX_CLEANUP_THRESHOLD)
  		igb_txeof(txr);
 @@ -1393,7 +1396,8 @@ igb_handle_que(void *context, int pending)
  		    !drbr_empty(ifp, txr->br))
  			igb_mq_start_locked(ifp, txr);
  #else
 -		if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +		if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 +		    !IFQ_DRV_IS_EMPTY(&ifp->if_snd))
  			igb_start_locked(txr, ifp);
  #endif
  		IGB_TX_UNLOCK(txr);
 @@ -1444,7 +1448,8 @@ igb_handle_link_locked(struct adapter *adapter)
  			    !drbr_empty(ifp, txr->br))
  				igb_mq_start_locked(ifp, txr);
  #else
 -			if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +			if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 +			    !IFQ_DRV_IS_EMPTY(&ifp->if_snd))
  				igb_start_locked(txr, ifp);
  #endif
  			IGB_TX_UNLOCK(txr);
 @@ -1581,7 +1586,8 @@ igb_msix_que(void *arg)
  	    !drbr_empty(ifp, txr->br))
  		igb_mq_start_locked(ifp, txr);
  #else
 -	if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +	if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 +	    !IFQ_DRV_IS_EMPTY(&ifp->if_snd))
  		igb_start_locked(txr, ifp);
  #endif
  	IGB_TX_UNLOCK(txr);
 @@ -2055,7 +2061,7 @@ retry:
  	tx_buffer = &txr->tx_buffers[first];
  	tx_buffer->next_eop = last;
  	/* Update the watchdog time early and often */
 -	txr->watchdog_time = ticks;
 +	txr->watchdog_time = IGB_WATCHDOG;
  
  	/*
  	 * Advance the Transmit Descriptor Tail (TDT), this tells the E1000
 @@ -2174,10 +2180,9 @@ igb_local_timer(void *arg)
  {
  	struct adapter		*adapter = arg;
  	device_t		dev = adapter->dev;
 -	struct ifnet		*ifp = adapter->ifp;
  	struct tx_ring		*txr = adapter->tx_rings;
  	struct igb_queue	*que = adapter->queues;
 -	int			hung = 0, busy = 0;
 +	int			hung = 0;
  
  
  	IGB_CORE_LOCK_ASSERT(adapter);
 @@ -2185,28 +2190,28 @@ igb_local_timer(void *arg)
  	igb_update_link_status(adapter);
  	igb_update_stats_counters(adapter);
  
 +	/*
 +	 * Don't check for any TX timeouts if the adapter received
 +	 * pause frames since the last tick or if the link is down.
 +	 */
 +	if (adapter->pause_frames != 0 || adapter->link_active == 0)
 +		goto out;
 +
          /*
          ** Check the TX queues status
 -	**	- central locked handling of OACTIVE
 -	**	- watchdog only if all queues show hung
 +	**	- watchdog if any queue hangs
          */
  	for (int i = 0; i < adapter->num_queues; i++, que++, txr++) {
 -		if ((txr->queue_status & IGB_QUEUE_HUNG) &&
 -		    (adapter->pause_frames == 0))
 -			++hung;
 -		if (txr->queue_status & IGB_QUEUE_DEPLETED)
 -			++busy;
 -		if ((txr->queue_status & IGB_QUEUE_IDLE) == 0)
 -			taskqueue_enqueue(que->tq, &que->que_task);
 +		IGB_TX_LOCK(txr);
 +		if (txr->watchdog_time >= 0)
 +			if (--txr->watchdog_time == 0)
 +				++hung;
 +		IGB_TX_UNLOCK(txr);
  	}
 -	if (hung == adapter->num_queues)
 +	if (hung != 0)
  		goto timeout;
 -	if (busy == adapter->num_queues)
 -		ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 -	else if ((ifp->if_drv_flags & IFF_DRV_OACTIVE) &&
 -	    (busy < adapter->num_queues))
 -		ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
  
 +out:
  	adapter->pause_frames = 0;
  	callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
  #ifndef DEVICE_POLLING
 @@ -2349,13 +2354,13 @@ igb_stop(void *arg)
  	callout_stop(&adapter->timer);
  
  	/* Tell the stack that the interface is no longer active */
 -	ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 -	ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 +	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
  
  	/* Disarm watchdog timer. */
  	for (int i = 0; i < adapter->num_queues; i++, txr++) {
  		IGB_TX_LOCK(txr);
  		txr->queue_status = IGB_QUEUE_IDLE;
 +		txr->watchdog_time = 0;
  		IGB_TX_UNLOCK(txr);
  	}
  
 @@ -3566,6 +3571,7 @@ igb_initialize_transmit_units(struct adapter *adap
  		    E1000_READ_REG(hw, E1000_TDBAL(i)),
  		    E1000_READ_REG(hw, E1000_TDLEN(i)));
  
 +		txr->watchdog_time = 0;
  		txr->queue_status = IGB_QUEUE_IDLE;
  
  		txdctl |= IGB_TX_PTHRESH;
 @@ -3930,7 +3936,7 @@ igb_txeof(struct tx_ring *txr)
                          	tx_buffer->m_head = NULL;
                  	}
  			tx_buffer->next_eop = -1;
 -			txr->watchdog_time = ticks;
 +			txr->watchdog_time = IGB_WATCHDOG;
  
  	                if (++first == adapter->num_tx_desc)
  				first = 0;
 @@ -3955,24 +3961,20 @@ igb_txeof(struct tx_ring *txr)
  
          txr->next_to_clean = first;
  
 -	/*
 -	** Watchdog calculation, we know there's
 -	** work outstanding or the first return
 -	** would have been taken, so none processed
 -	** for too long indicates a hang.
 -	*/
 -	if ((!processed) && ((ticks - txr->watchdog_time) > IGB_WATCHDOG))
 -		txr->queue_status |= IGB_QUEUE_HUNG;
          /*
           * If we have a minimum free,
           * clear depleted state bit
           */
 -        if (txr->tx_avail >= IGB_QUEUE_THRESHOLD)          
 +        if (txr->tx_avail >= IGB_QUEUE_THRESHOLD) {
                  txr->queue_status &= ~IGB_QUEUE_DEPLETED;
 +#if __FreeBSD_version >= 800000
 +		ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 +#endif
 +	}
  
  	/* All clean, turn off the watchdog */
  	if (txr->tx_avail == adapter->num_tx_desc) {
 -		txr->queue_status = IGB_QUEUE_IDLE;
 +		txr->watchdog_time = 0;
  		return (FALSE);
          }
  
 
 -- 
 John Baldwin

From: John Baldwin <jhb@freebsd.org>
To: freebsd-net@freebsd.org
Cc: Jack Vogel <jfvogel@gmail.com>,
 bug-followup@freebsd.org,
 Mike Karels <mike@karels.net>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:27:09 -0400

 A second patch.  This is not something I mentioned before, but I had this in 
 my checkout.  In the legacy IRQ case this could also result in out-of-order 
 processing.  It also fixes a potential OACTIVE-stuck type bug that we used to 
 have in igb.  I have no way to test this, so it would be good if some other 
 folks could test this.
 
 The patch changes ixgbe_txeof() return void and changes the few places that 
 checked its return value to ignore it.  While it is true that ixgbe has a tx 
 processing limit (which I think is dubious.. TX completion processing is very 
 cheap unlike RX processing, so it seems to me like it should always run to 
 completion as in igb), in the common case I think the result will be to do 
 what igb used to do: poll the ring at 100% CPU (either in the interrupt 
 handler or in the task it keeps rescheduling) waiting for pending TX packets 
 to be completed (which is pointless: the host CPU can't make the NIC transmit 
 packets any faster by polling).
 
 It also changes the interrupt handlers to restart packet transmission 
 synchronously rather than always deferring that to a task (the former is what 
 (nearly) all other drivers do).  It also fixes the interrupt handlers to be 
 consistent (one looped on txeof but not the others).  In the case of the
 legacy interrupt handler it is possible it could fail to restart packet
 transmission if there were no pending RX packets after rxeof returned and
 txeof fully cleaned its ring without this change.
 
 It also fixes the legacy interrupt handler to not re-enable the interrupt if 
 it schedules the task but to wait until the task completes (this could result
 in concurrent, out-of-order RX processing).
 
 Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c	(revision 249553)
 +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c	(working copy)
 @@ -149,7 +149,7 @@
  static void     ixgbe_enable_intr(struct adapter *);
  static void     ixgbe_disable_intr(struct adapter *);
  static void     ixgbe_update_stats_counters(struct adapter *);
 -static bool	ixgbe_txeof(struct tx_ring *);
 +static void	ixgbe_txeof(struct tx_ring *);
  static bool	ixgbe_rxeof(struct ix_queue *);
  static void	ixgbe_rx_checksum(u32, struct mbuf *, u32);
  static void     ixgbe_set_promisc(struct adapter *);
 @@ -1431,7 +1414,10 @@
  	}
  
  	/* Reenable this interrupt */
 -	ixgbe_enable_queue(adapter, que->msix);
 +	if (que->res != NULL)
 +		ixgbe_enable_queue(adapter, que->msix);
 +	else
 +		ixgbe_enable_intr(adapter);
  	return;
  }
  
 @@ -1449,8 +1435,9 @@
  	struct adapter	*adapter = que->adapter;
  	struct ixgbe_hw	*hw = &adapter->hw;
  	struct 		tx_ring *txr = adapter->tx_rings;
 -	bool		more_tx, more_rx;
 -	u32       	reg_eicr, loop = MAX_LOOP;
 +	struct ifnet    *ifp = adapter->ifp;
 +	bool		more;
 +	u32       	reg_eicr;
  
  
  	reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
 @@ -1461,17 +1448,19 @@
  		return;
  	}
  
 -	more_rx = ixgbe_rxeof(que);
 +	more = ixgbe_rxeof(que);
  
  	IXGBE_TX_LOCK(txr);
 -	do {
 -		more_tx = ixgbe_txeof(txr);
 -	} while (loop-- && more_tx);
 +	ixgbe_txeof(txr);
 +#if __FreeBSD_version >= 800000
 +	if (!drbr_empty(ifp, txr->br))
 +		ixgbe_mq_start_locked(ifp, txr, NULL);
 +#else
 +	if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 +		ixgbe_start_locked(txr, ifp);
 +#endif
  	IXGBE_TX_UNLOCK(txr);
  
 -	if (more_rx || more_tx)
 -		taskqueue_enqueue(que->tq, &que->que_task);
 -
  	/* Check for fan failure */
  	if ((hw->phy.media_type == ixgbe_media_type_copper) &&
  	    (reg_eicr & IXGBE_EICR_GPI_SDP1)) {
 @@ -1484,7 +1473,10 @@
  	if (reg_eicr & IXGBE_EICR_LSC)
  		taskqueue_enqueue(adapter->tq, &adapter->link_task);
  
 -	ixgbe_enable_intr(adapter);
 +	if (more)
 +		taskqueue_enqueue(que->tq, &que->que_task);
 +	else
 +		ixgbe_enable_intr(adapter);
  	return;
  }
  
 @@ -1501,27 +1493,24 @@
  	struct adapter  *adapter = que->adapter;
  	struct tx_ring	*txr = que->txr;
  	struct rx_ring	*rxr = que->rxr;
 -	bool		more_tx, more_rx;
 +	struct ifnet    *ifp = adapter->ifp;
 +	bool		more;
  	u32		newitr = 0;
  
  	ixgbe_disable_queue(adapter, que->msix);
  	++que->irqs;
  
 -	more_rx = ixgbe_rxeof(que);
 +	more = ixgbe_rxeof(que);
  
  	IXGBE_TX_LOCK(txr);
 -	more_tx = ixgbe_txeof(txr);
 -	/*
 -	** Make certain that if the stack 
 -	** has anything queued the task gets
 -	** scheduled to handle it.
 -	*/
 +	ixgbe_txeof(txr);
  #ifdef IXGBE_LEGACY_TX
  	if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))
 +		ixgbe_start_locked(txr, ifp);
  #else
 -	if (!drbr_empty(adapter->ifp, txr->br))
 +	if (!drbr_empty(ifp, txr->br))
 +		ixgbe_mq_start_locked(ifp, txr, NULL);
  #endif
 -		more_tx = 1;
  	IXGBE_TX_UNLOCK(txr);
  
  	/* Do AIM now? */
 @@ -1575,7 +1564,7 @@
          rxr->packets = 0;
  
  no_calc:
 -	if (more_tx || more_rx)
 +	if (more)
  		taskqueue_enqueue(que->tq, &que->que_task);
  	else /* Reenable this interrupt */
  		ixgbe_enable_queue(adapter, que->msix);
 @@ -3557,7 +3545,7 @@
   *  tx_buffer is put back on the free queue.
   *
   **********************************************************************/
 -static bool
 +static void
  ixgbe_txeof(struct tx_ring *txr)
  {
  	struct adapter		*adapter = txr->adapter;
 @@ -3605,13 +3593,13 @@
  			IXGBE_CORE_UNLOCK(adapter);
  			IXGBE_TX_LOCK(txr);
  		}
 -		return FALSE;
 +		return;
  	}
  #endif /* DEV_NETMAP */
  
  	if (txr->tx_avail == txr->num_desc) {
  		txr->queue_status = IXGBE_QUEUE_IDLE;
 -		return FALSE;
 +		return;
  	}
  
  	/* Get work starting point */
 @@ -3705,12 +3693,8 @@
  	if ((!processed) && ((ticks - txr->watchdog_time) > IXGBE_WATCHDOG))
  		txr->queue_status = IXGBE_QUEUE_HUNG;
  
 -	if (txr->tx_avail == txr->num_desc) {
 +	if (txr->tx_avail == txr->num_desc)
  		txr->queue_status = IXGBE_QUEUE_IDLE;
 -		return (FALSE);
 -	}
 -
 -	return TRUE;
  }
  
  /*********************************************************************
 
 
 -- 
 John Baldwin

From: John Baldwin <jhb@freebsd.org>
To: freebsd-net@freebsd.org
Cc: Jack Vogel <jfvogel@gmail.com>,
 bug-followup@freebsd.org
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:09:11 -0400

 I want to make some progress on this, so let's break this up into smaller 
 parts.
 
 First, I think both calls to rearm_queues() should be removed.  In the case of 
 the local timer, this can only re-enable interrupts if the interrupt handler 
 is already scheduled or running or its associated task is running.  In the 
 last case this means the ithread can run concurrently with the interrupt 
 handler causing out-of-order processing.  The rxeof case has the same issue.  
 Normally the code calling rxeof is going to re-enable the interrupt if rxeof 
 runs to completion, and if not it is going to schedule the taskqueue.  The 
 effect of the rxeof change was to always re-enable interrupts before 
 scheduling the taskqueue which can result in those running concurrently.
 
 Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
 ===================================================================
 --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c	(revision 249553)
 +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c	(working copy)
 @@ -1386,23 +1386,6 @@
  	}
  }
  
 -static inline void
 -ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
 -{
 -	u32 mask;
 -
 -	if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
 -		mask = (IXGBE_EIMS_RTX_QUEUE & queues);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, mask);
 -	} else {
 -		mask = (queues & 0xFFFFFFFF);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(0), mask);
 -		mask = (queues >> 32);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(1), mask);
 -	}
 -}
 -
 -
  static void
  ixgbe_handle_que(void *context, int pending)
  {
 @@ -2069,7 +2055,6 @@
                  goto watchdog;
  
  out:
 -	ixgbe_rearm_queues(adapter, adapter->que_mask);
  	callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
  	return;
  
 @@ -4596,14 +4577,8 @@
  
  	/*
  	** We still have cleaning to do?
 -	** Schedule another interrupt if so.
  	*/
 -	if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
 -		ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
 -		return (TRUE);
 -	}
 -
 -	return (FALSE);
 +	return ((staterr & IXGBE_RXD_STAT_DD) != 0);
  }
  
  
 -- 
 John Baldwin

From: Jack Vogel <jfvogel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: FreeBSD Net <freebsd-net@freebsd.org>, bug-followup@freebsd.org, 
	Mike Karels <mike@karels.net>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving
 out-of-order packet process and spurious RST
Date: Fri, 19 Apr 2013 12:32:59 -0700

 --14dae9cdc48767db0904dabbc98d
 Content-Type: text/plain; charset=ISO-8859-1
 
 Thanks John, I'm incorporating your changes into my source tree. I also
 plan on changing the
 "glue" between mq_start and mq_start_locked on igb after some UDP testing
 that was done, and
 believe ixgbe should follow suit. Results there have shown the latency is
 just too high if I only use
 the task_enqueue... What works best is to always queue to the buf ring, but
 then also always to
 do the TRY_LOCK. I will update HEAD as soon as I handle an internal
 firedrill I have today :)
 
 Jack
 
 
 
 On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <jhb@freebsd.org> wrote:
 
 > A second patch.  This is not something I mentioned before, but I had this
 > in
 > my checkout.  In the legacy IRQ case this could also result in out-of-order
 > processing.  It also fixes a potential OACTIVE-stuck type bug that we used
 > to
 > have in igb.  I have no way to test this, so it would be good if some other
 > folks could test this.
 >
 > The patch changes ixgbe_txeof() return void and changes the few places that
 > checked its return value to ignore it.  While it is true that ixgbe has a
 > tx
 > processing limit (which I think is dubious.. TX completion processing is
 > very
 > cheap unlike RX processing, so it seems to me like it should always run to
 > completion as in igb), in the common case I think the result will be to do
 > what igb used to do: poll the ring at 100% CPU (either in the interrupt
 > handler or in the task it keeps rescheduling) waiting for pending TX
 > packets
 > to be completed (which is pointless: the host CPU can't make the NIC
 > transmit
 > packets any faster by polling).
 >
 > It also changes the interrupt handlers to restart packet transmission
 > synchronously rather than always deferring that to a task (the former is
 > what
 > (nearly) all other drivers do).  It also fixes the interrupt handlers to be
 > consistent (one looped on txeof but not the others).  In the case of the
 > legacy interrupt handler it is possible it could fail to restart packet
 > transmission if there were no pending RX packets after rxeof returned and
 > txeof fully cleaned its ring without this change.
 >
 > It also fixes the legacy interrupt handler to not re-enable the interrupt
 > if
 > it schedules the task but to wait until the task completes (this could
 > result
 > in concurrent, out-of-order RX processing).
 >
 > Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
 > ===================================================================
 > --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c       (revision
 > 249553)
 > +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c       (working
 > copy)
 > @@ -149,7 +149,7 @@
 >  static void     ixgbe_enable_intr(struct adapter *);
 >  static void     ixgbe_disable_intr(struct adapter *);
 >  static void     ixgbe_update_stats_counters(struct adapter *);
 > -static bool    ixgbe_txeof(struct tx_ring *);
 > +static void    ixgbe_txeof(struct tx_ring *);
 >  static bool    ixgbe_rxeof(struct ix_queue *);
 >  static void    ixgbe_rx_checksum(u32, struct mbuf *, u32);
 >  static void     ixgbe_set_promisc(struct adapter *);
 > @@ -1431,7 +1414,10 @@
 >         }
 >
 >         /* Reenable this interrupt */
 > -       ixgbe_enable_queue(adapter, que->msix);
 > +       if (que->res != NULL)
 > +               ixgbe_enable_queue(adapter, que->msix);
 > +       else
 > +               ixgbe_enable_intr(adapter);
 >         return;
 >  }
 >
 > @@ -1449,8 +1435,9 @@
 >         struct adapter  *adapter = que->adapter;
 >         struct ixgbe_hw *hw = &adapter->hw;
 >         struct          tx_ring *txr = adapter->tx_rings;
 > -       bool            more_tx, more_rx;
 > -       u32             reg_eicr, loop = MAX_LOOP;
 > +       struct ifnet    *ifp = adapter->ifp;
 > +       bool            more;
 > +       u32             reg_eicr;
 >
 >
 >         reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
 > @@ -1461,17 +1448,19 @@
 >                 return;
 >         }
 >
 > -       more_rx = ixgbe_rxeof(que);
 > +       more = ixgbe_rxeof(que);
 >
 >         IXGBE_TX_LOCK(txr);
 > -       do {
 > -               more_tx = ixgbe_txeof(txr);
 > -       } while (loop-- && more_tx);
 > +       ixgbe_txeof(txr);
 > +#if __FreeBSD_version >= 800000
 > +       if (!drbr_empty(ifp, txr->br))
 > +               ixgbe_mq_start_locked(ifp, txr, NULL);
 > +#else
 > +       if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 > +               ixgbe_start_locked(txr, ifp);
 > +#endif
 >         IXGBE_TX_UNLOCK(txr);
 >
 > -       if (more_rx || more_tx)
 > -               taskqueue_enqueue(que->tq, &que->que_task);
 > -
 >         /* Check for fan failure */
 >         if ((hw->phy.media_type == ixgbe_media_type_copper) &&
 >             (reg_eicr & IXGBE_EICR_GPI_SDP1)) {
 > @@ -1484,7 +1473,10 @@
 >         if (reg_eicr & IXGBE_EICR_LSC)
 >                 taskqueue_enqueue(adapter->tq, &adapter->link_task);
 >
 > -       ixgbe_enable_intr(adapter);
 > +       if (more)
 > +               taskqueue_enqueue(que->tq, &que->que_task);
 > +       else
 > +               ixgbe_enable_intr(adapter);
 >         return;
 >  }
 >
 > @@ -1501,27 +1493,24 @@
 >         struct adapter  *adapter = que->adapter;
 >         struct tx_ring  *txr = que->txr;
 >         struct rx_ring  *rxr = que->rxr;
 > -       bool            more_tx, more_rx;
 > +       struct ifnet    *ifp = adapter->ifp;
 > +       bool            more;
 >         u32             newitr = 0;
 >
 >         ixgbe_disable_queue(adapter, que->msix);
 >         ++que->irqs;
 >
 > -       more_rx = ixgbe_rxeof(que);
 > +       more = ixgbe_rxeof(que);
 >
 >         IXGBE_TX_LOCK(txr);
 > -       more_tx = ixgbe_txeof(txr);
 > -       /*
 > -       ** Make certain that if the stack
 > -       ** has anything queued the task gets
 > -       ** scheduled to handle it.
 > -       */
 > +       ixgbe_txeof(txr);
 >  #ifdef IXGBE_LEGACY_TX
 >         if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))
 > +               ixgbe_start_locked(txr, ifp);
 >  #else
 > -       if (!drbr_empty(adapter->ifp, txr->br))
 > +       if (!drbr_empty(ifp, txr->br))
 > +               ixgbe_mq_start_locked(ifp, txr, NULL);
 >  #endif
 > -               more_tx = 1;
 >         IXGBE_TX_UNLOCK(txr);
 >
 >         /* Do AIM now? */
 > @@ -1575,7 +1564,7 @@
 >          rxr->packets = 0;
 >
 >  no_calc:
 > -       if (more_tx || more_rx)
 > +       if (more)
 >                 taskqueue_enqueue(que->tq, &que->que_task);
 >         else /* Reenable this interrupt */
 >                 ixgbe_enable_queue(adapter, que->msix);
 > @@ -3557,7 +3545,7 @@
 >   *  tx_buffer is put back on the free queue.
 >   *
 >   **********************************************************************/
 > -static bool
 > +static void
 >  ixgbe_txeof(struct tx_ring *txr)
 >  {
 >         struct adapter          *adapter = txr->adapter;
 > @@ -3605,13 +3593,13 @@
 >                         IXGBE_CORE_UNLOCK(adapter);
 >                         IXGBE_TX_LOCK(txr);
 >                 }
 > -               return FALSE;
 > +               return;
 >         }
 >  #endif /* DEV_NETMAP */
 >
 >         if (txr->tx_avail == txr->num_desc) {
 >                 txr->queue_status = IXGBE_QUEUE_IDLE;
 > -               return FALSE;
 > +               return;
 >         }
 >
 >         /* Get work starting point */
 > @@ -3705,12 +3693,8 @@
 >         if ((!processed) && ((ticks - txr->watchdog_time) >
 > IXGBE_WATCHDOG))
 >                 txr->queue_status = IXGBE_QUEUE_HUNG;
 >
 > -       if (txr->tx_avail == txr->num_desc) {
 > +       if (txr->tx_avail == txr->num_desc)
 >                 txr->queue_status = IXGBE_QUEUE_IDLE;
 > -               return (FALSE);
 > -       }
 > -
 > -       return TRUE;
 >  }
 >
 >  /*********************************************************************
 >
 >
 > --
 > John Baldwin
 >
 
 --14dae9cdc48767db0904dabbc98d
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 <div dir=3D"ltr"><div><div><div><div>Thanks John, I&#39;m incorporating you=
 r changes into my source tree. I also plan on changing the<br></div>&quot;g=
 lue&quot; between mq_start and mq_start_locked on igb after some UDP testin=
 g that was done, and<br>
 </div>believe ixgbe should follow suit. Results there have shown the latenc=
 y is just too high if I only use<br>the task_enqueue... What works best is =
 to always queue to the buf ring, but then also always to<br></div>do the TR=
 Y_LOCK. I will update HEAD as soon as I handle an internal firedrill I have=
  today :)<br>
 <br></div>Jack<br><br></div><div class=3D"gmail_extra"><br><br><div class=
 =3D"gmail_quote">On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <span dir=3D=
 "ltr">&lt;<a href=3D"mailto:jhb@freebsd.org" target=3D"_blank">jhb@freebsd.=
 org</a>&gt;</span> wrote:<br>
 <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
 x #ccc solid;padding-left:1ex">A second patch. =A0This is not something I m=
 entioned before, but I had this in<br>
 my checkout. =A0In the legacy IRQ case this could also result in out-of-ord=
 er<br>
 processing. =A0It also fixes a potential OACTIVE-stuck type bug that we use=
 d to<br>
 have in igb. =A0I have no way to test this, so it would be good if some oth=
 er<br>
 folks could test this.<br>
 <br>
 The patch changes ixgbe_txeof() return void and changes the few places that=
 <br>
 checked its return value to ignore it. =A0While it is true that ixgbe has a=
  tx<br>
 processing limit (which I think is dubious.. TX completion processing is ve=
 ry<br>
 cheap unlike RX processing, so it seems to me like it should always run to<=
 br>
 completion as in igb), in the common case I think the result will be to do<=
 br>
 what igb used to do: poll the ring at 100% CPU (either in the interrupt<br>
 handler or in the task it keeps rescheduling) waiting for pending TX packet=
 s<br>
 to be completed (which is pointless: the host CPU can&#39;t make the NIC tr=
 ansmit<br>
 packets any faster by polling).<br>
 <br>
 It also changes the interrupt handlers to restart packet transmission<br>
 synchronously rather than always deferring that to a task (the former is wh=
 at<br>
 (nearly) all other drivers do). =A0It also fixes the interrupt handlers to =
 be<br>
 consistent (one looped on txeof but not the others). =A0In the case of the<=
 br>
 legacy interrupt handler it is possible it could fail to restart packet<br>
 transmission if there were no pending RX packets after rxeof returned and<b=
 r>
 txeof fully cleaned its ring without this change.<br>
 <br>
 It also fixes the legacy interrupt handler to not re-enable the interrupt i=
 f<br>
 it schedules the task but to wait until the task completes (this could resu=
 lt<br>
 in concurrent, out-of-order RX processing).<br>
 <div class=3D"im"><br>
 Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c<br>
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
 --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (revi=
 sion 249553)<br>
 +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (work=
 ing copy)<br>
 </div>@@ -149,7 +149,7 @@<br>
 =A0static void =A0 =A0 ixgbe_enable_intr(struct adapter *);<br>
 =A0static void =A0 =A0 ixgbe_disable_intr(struct adapter *);<br>
 =A0static void =A0 =A0 ixgbe_update_stats_counters(struct adapter *);<br>
 -static bool =A0 =A0ixgbe_txeof(struct tx_ring *);<br>
 +static void =A0 =A0ixgbe_txeof(struct tx_ring *);<br>
 =A0static bool =A0 =A0ixgbe_rxeof(struct ix_queue *);<br>
 =A0static void =A0 =A0ixgbe_rx_checksum(u32, struct mbuf *, u32);<br>
 =A0static void =A0 =A0 ixgbe_set_promisc(struct adapter *);<br>
 @@ -1431,7 +1414,10 @@<br>
 =A0 =A0 =A0 =A0 }<br>
 <br>
 =A0 =A0 =A0 =A0 /* Reenable this interrupt */<br>
 - =A0 =A0 =A0 ixgbe_enable_queue(adapter, que-&gt;msix);<br>
 + =A0 =A0 =A0 if (que-&gt;res !=3D NULL)<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que-&gt;msix);<br=
 >
 + =A0 =A0 =A0 else<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
 =A0 =A0 =A0 =A0 return;<br>
 =A0}<br>
 <br>
 @@ -1449,8 +1435,9 @@<br>
 =A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que-&gt;adapter;<br>
 =A0 =A0 =A0 =A0 struct ixgbe_hw *hw =3D &amp;adapter-&gt;hw;<br>
 =A0 =A0 =A0 =A0 struct =A0 =A0 =A0 =A0 =A0tx_ring *txr =3D adapter-&gt;tx_r=
 ings;<br>
 - =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br>
 - =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr, loop =3D MAX_LOOP;<br>
 + =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter-&gt;ifp;<br>
 + =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br>
 + =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr;<br>
 <br>
 <br>
 =A0 =A0 =A0 =A0 reg_eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);<br>
 @@ -1461,17 +1448,19 @@<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
 =A0 =A0 =A0 =A0 }<br>
 <br>
 - =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br>
 + =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br>
 <br>
 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
 - =A0 =A0 =A0 do {<br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br>
 - =A0 =A0 =A0 } while (loop-- &amp;&amp; more_tx);<br>
 + =A0 =A0 =A0 ixgbe_txeof(txr);<br>
 +#if __FreeBSD_version &gt;=3D 800000<br>
 + =A0 =A0 =A0 if (!drbr_empty(ifp, txr-&gt;br))<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br>
 +#else<br>
 + =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&amp;ifp-&gt;if_snd))<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br>
 +#endif<br>
 =A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br>
 <br>
 - =A0 =A0 =A0 if (more_rx || more_tx)<br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que-&gt;tq, &amp;que-&gt;qu=
 e_task);<br>
 -<br>
 =A0 =A0 =A0 =A0 /* Check for fan failure */<br>
 =A0 =A0 =A0 =A0 if ((hw-&gt;phy.media_type =3D=3D ixgbe_media_type_copper) =
 &amp;&amp;<br>
 =A0 =A0 =A0 =A0 =A0 =A0 (reg_eicr &amp; IXGBE_EICR_GPI_SDP1)) {<br>
 @@ -1484,7 +1473,10 @@<br>
 =A0 =A0 =A0 =A0 if (reg_eicr &amp; IXGBE_EICR_LSC)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(adapter-&gt;tq, &amp;adap=
 ter-&gt;link_task);<br>
 <br>
 - =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
 + =A0 =A0 =A0 if (more)<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que-&gt;tq, &amp;que-&gt;qu=
 e_task);<br>
 + =A0 =A0 =A0 else<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);<br>
 =A0 =A0 =A0 =A0 return;<br>
 =A0}<br>
 <br>
 @@ -1501,27 +1493,24 @@<br>
 =A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que-&gt;adapter;<br>
 =A0 =A0 =A0 =A0 struct tx_ring =A0*txr =3D que-&gt;txr;<br>
 =A0 =A0 =A0 =A0 struct rx_ring =A0*rxr =3D que-&gt;rxr;<br>
 - =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;<br>
 + =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter-&gt;ifp;<br>
 + =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;<br>
 =A0 =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 newitr =3D 0;<br>
 <br>
 =A0 =A0 =A0 =A0 ixgbe_disable_queue(adapter, que-&gt;msix);<br>
 =A0 =A0 =A0 =A0 ++que-&gt;irqs;<br>
 <br>
 - =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);<br>
 + =A0 =A0 =A0 more =3D ixgbe_rxeof(que);<br>
 <br>
 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
 - =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);<br>
 - =A0 =A0 =A0 /*<br>
 - =A0 =A0 =A0 ** Make certain that if the stack<br>
 - =A0 =A0 =A0 ** has anything queued the task gets<br>
 - =A0 =A0 =A0 ** scheduled to handle it.<br>
 - =A0 =A0 =A0 */<br>
 + =A0 =A0 =A0 ixgbe_txeof(txr);<br>
 =A0#ifdef IXGBE_LEGACY_TX<br>
 =A0 =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&amp;adapter-&gt;ifp-&gt;if_snd))<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);<br>
 =A0#else<br>
 - =A0 =A0 =A0 if (!drbr_empty(adapter-&gt;ifp, txr-&gt;br))<br>
 + =A0 =A0 =A0 if (!drbr_empty(ifp, txr-&gt;br))<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);<br>
 =A0#endif<br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D 1;<br>
 =A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);<br>
 <br>
 =A0 =A0 =A0 =A0 /* Do AIM now? */<br>
 @@ -1575,7 +1564,7 @@<br>
 =A0 =A0 =A0 =A0 =A0rxr-&gt;packets =3D 0;<br>
 <br>
 =A0no_calc:<br>
 - =A0 =A0 =A0 if (more_tx || more_rx)<br>
 + =A0 =A0 =A0 if (more)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que-&gt;tq, &amp;que-&gt;=
 que_task);<br>
 =A0 =A0 =A0 =A0 else /* Reenable this interrupt */<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que-&gt;msix);<=
 br>
 @@ -3557,7 +3545,7 @@<br>
 =A0 * =A0tx_buffer is put back on the free queue.<br>
 =A0 *<br>
 =A0 **********************************************************************/=
 <br>
 -static bool<br>
 +static void<br>
 =A0ixgbe_txeof(struct tx_ring *txr)<br>
 =A0{<br>
 =A0 =A0 =A0 =A0 struct adapter =A0 =A0 =A0 =A0 =A0*adapter =3D txr-&gt;adap=
 ter;<br>
 @@ -3605,13 +3593,13 @@<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_CORE_UNLOCK(adapter);=
 <br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }<br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
 =A0 =A0 =A0 =A0 }<br>
 =A0#endif /* DEV_NETMAP */<br>
 <br>
 =A0 =A0 =A0 =A0 if (txr-&gt;tx_avail =3D=3D txr-&gt;num_desc) {<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr-&gt;queue_status =3D IXGBE_QUEUE_IDLE;<=
 br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;<br>
 + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;<br>
 =A0 =A0 =A0 =A0 }<br>
 <br>
 =A0 =A0 =A0 =A0 /* Get work starting point */<br>
 @@ -3705,12 +3693,8 @@<br>
 =A0 =A0 =A0 =A0 if ((!processed) &amp;&amp; ((ticks - txr-&gt;watchdog_time=
 ) &gt; IXGBE_WATCHDOG))<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr-&gt;queue_status =3D IXGBE_QUEUE_HUNG;<=
 br>
 <br>
 - =A0 =A0 =A0 if (txr-&gt;tx_avail =3D=3D txr-&gt;num_desc) {<br>
 + =A0 =A0 =A0 if (txr-&gt;tx_avail =3D=3D txr-&gt;num_desc)<br>
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr-&gt;queue_status =3D IXGBE_QUEUE_IDLE;<=
 br>
 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (FALSE);<br>
 - =A0 =A0 =A0 }<br>
 -<br>
 - =A0 =A0 =A0 return TRUE;<br>
 =A0}<br>
 <br>
 =A0/*********************************************************************<b=
 r>
 <span class=3D"HOEnZb"><font color=3D"#888888"><br>
 <br>
 --<br>
 John Baldwin<br>
 </font></span></blockquote></div><br></div>
 
 --14dae9cdc48767db0904dabbc98d--

From: "Charbon, Julien" <jcharbon@verisign.com>
To: bug-followup@freebsd.org, jcharbon@verisign.com
Cc: John Baldwin <jhb@freebsd.org>
Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order
 packet process and spurious RST
Date: Thu, 05 Sep 2013 15:05:15 +0200

 This is a multi-part message in MIME format.
 --------------090007000606090509060801
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 
   Just a PR update:  This issue is fixed in releng/9.2 (since 9.2-RC2 
 and later), especially with these commits:
 
 - Fix local timer watchdog using taskque_enqueue(&que->que_task) instead 
 of taskqueue_enqueue(&txr->txq_task)(one line change in 
 ixgbe_local_timer()):
 http://svnweb.freebsd.org/base?view=revision&revision=251964
 
 - Not calling (and then remove) ixgbe_rearm_queues():
 http://svnweb.freebsd.org/base?view=revision&revision=253865
 
   These changes did not reach (yet) stable/8.  Joined the current patch 
 for releng/8.4.  Thanks to John Balwin for accepting and pushing these 
 changes.
 
 --
 Julien Charbon
 
 --------------090007000606090509060801
 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
  name="releng-8.4.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="releng-8.4.patch"
 
 diff --git a/sys/dev/ixgbe/ixgbe.c b/sys/dev/ixgbe/ixgbe.c
 index df37621..9a00517 100644
 --- a/sys/dev/ixgbe/ixgbe.c
 +++ b/sys/dev/ixgbe/ixgbe.c
 @@ -1396,23 +1396,6 @@ ixgbe_disable_queue(struct adapter *adapter, u32 vector)
  	}
  }
  
 -static inline void
 -ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
 -{
 -	u32 mask;
 -
 -	if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
 -		mask = (IXGBE_EIMS_RTX_QUEUE & queues);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, mask);
 -	} else {
 -		mask = (queues & 0xFFFFFFFF);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(0), mask);
 -		mask = (queues >> 32);
 -		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS_EX(1), mask);
 -	}
 -}
 -
 -
  static void
  ixgbe_handle_que(void *context, int pending)
  {
 @@ -2046,14 +2029,13 @@ ixgbe_local_timer(void *arg)
  		    (paused == 0))
  			++hung;
  		else if (txr->queue_status == IXGBE_QUEUE_WORKING)
 -			taskqueue_enqueue(que->tq, &que->que_task);
 +			taskqueue_enqueue(que->tq, &txr->txq_task);
          }
  	/* Only truely watchdog if all queues show hung */
          if (hung == adapter->num_tx_queues)
                  goto watchdog;
  
  out:
 -	ixgbe_rearm_queues(adapter, adapter->que_mask);
  	callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
  	return;
  
 @@ -4559,7 +4541,6 @@ next_desc:
  	** Schedule another interrupt if so.
  	*/
  	if ((staterr & IXGBE_RXD_STAT_DD) != 0) {
 -		ixgbe_rearm_queues(adapter, (u64)(1 << que->msix));
  		return (TRUE);
  	}
  
 
 --------------090007000606090509060801--
Responsible-Changed-From-To: freebsd-net->jvf 
Responsible-Changed-By: jmg 
Responsible-Changed-When: Tue Jan 28 19:56:21 UTC 2014 
Responsible-Changed-Why:  
assign this to Jack so he gets bugged about it weekly.. :) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=176446 
Responsible-Changed-From-To: jvf->jfv 
Responsible-Changed-By: jmg 
Responsible-Changed-When: Tue Jan 28 20:31:04 UTC 2014 
Responsible-Changed-Why:  
ugh, fix the name.. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=176446 

From: "Bentkofsky, Michael" <MBentkofsky@verisign.com>
To: "freebsd-net@freebsd.org" <freebsd-net@FreeBSD.org>
Cc:  
Subject: RE: kern/176446: [netinet] [patch] Concurrency in ixgbe
Date: Wed, 29 Jan 2014 18:08:33 +0000

 I believe this has been fixed in r240968.
 
 _______________________________________________
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>Unformatted:
