From nobody@FreeBSD.org  Fri Jan 16 13:35:22 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6E4DD16A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jan 2004 13:35:22 -0800 (PST)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C6CB443D31
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jan 2004 13:35:17 -0800 (PST)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.10/8.12.10) with ESMTP id i0GLZHdL036055
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jan 2004 13:35:17 -0800 (PST)
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.10/8.12.10/Submit) id i0GLZHAh036043;
	Fri, 16 Jan 2004 13:35:17 -0800 (PST)
	(envelope-from nobody)
Message-Id: <200401162135.i0GLZHAh036043@www.freebsd.org>
Date: Fri, 16 Jan 2004 13:35:17 -0800 (PST)
From: Tim Draegen-Gilman <tim@eudaemon.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: RealTek driver (rl) fails to adjust TX threshold on TX underrun
X-Send-Pr-Version: www-2.0

>Number:         61448
>Category:       kern
>Synopsis:       [patch] RealTek driver rl(4) fails to adjust TX threshold on TX underrun
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    mlaier
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 16 13:40:05 PST 2004
>Closed-Date:    Fri Feb 18 15:59:52 GMT 2005
>Last-Modified:  Fri Feb 18 15:59:52 GMT 2005
>Originator:     Tim Draegen-Gilman
>Release:        4.8
>Organization:
Vernier Networks, Inc.
>Environment:
FreeBSD 4.8-RELEASE i386
>Description:
The rl driver fails to properly detect the TX underflow condition.  This stems from a RealTek documentation ambiguity.  This failure to detect TX underflow leads to the driver never being able to increase the TX threshold, which causes the NIC to (over time)
continue to underflow.  This results in malformed packets appearing on the wire.
>How-To-Repeat:
1.  Get yourself a RealTek 8139-based NIC.
2.  Connect your NIC to a Cisco Switch
3.  Watch as the Cisco reports CRC errors.
4.  CRC errors exactly match unaddressed TX underflow condition from RealTek chip.
>Fix:
The follwing patch corrects this problem.  It adds comments, deals with the
underflow condition while processing a "TX_OK" bit (this is the documentation
flaw -- underflow and OK are both set at the same time), increases the Tx threshold
step from 32 to 64, adds a sanity check to provide a TX threshold ceiling, and
avoids reset/init cycle of the chip on out-put error (this is a bit heavy handed).
The two debug printfs can be removed (but are necessary to show that the
"underrun" and "OK" bits are both set at the same time).

Index: freebsd/sys/pci/if_rl.c
===================================================================
RCS file: /home/cvs/ambit2/freebsd/sys/pci/if_rl.c,v
retrieving revision 1.4
diff -u -r1.4 if_rl.c
--- freebsd/sys/pci/if_rl.c     20 Jun 2003 04:24:31 -0000      1.4
+++ freebsd/sys/pci/if_rl.c     16 Jan 2004 02:36:51 -0000
@@ -1209,36 +1209,48 @@
         * frames that have been uploaded.
         */
        do {
+               /* Grab the TX status */
                txstat = CSR_READ_4(sc, RL_LAST_TXSTAT(sc));
+
+               /* Only deal with OK, Underrun, and Abort conditions */
                if (!(txstat & (RL_TXSTAT_TX_OK|
                    RL_TXSTAT_TX_UNDERRUN|RL_TXSTAT_TXABRT)))
                        break;
 
-               ifp->if_collisions += (txstat & RL_TXSTAT_COLLCNT) >> 24;
-
+               /* Free up memory */
                if (RL_LAST_TXMBUF(sc) != NULL) {
                        m_freem(RL_LAST_TXMBUF(sc));
                        RL_LAST_TXMBUF(sc) = NULL;
                }
-               if (txstat & RL_TXSTAT_TX_OK)
-                       ifp->if_opackets++;
-               else {
-                       int                     oldthresh;
+
+               /* Deal with abort/out-of-window condition */
+               if (txstat & (RL_TXSTAT_OUTOFWIN | RL_TXSTAT_TXABRT)) {
                        ifp->if_oerrors++;
-                       if ((txstat & RL_TXSTAT_TXABRT) ||
-                           (txstat & RL_TXSTAT_OUTOFWIN))
-                               CSR_WRITE_4(sc, RL_TXCFG, RL_TXCFG_CONFIG);
-                       oldthresh = sc->rl_txthresh;
-                       /* error recovery */
-                       rl_reset(sc);
-                       rl_init(sc);
+
+       printf("rl_txeof error: TSR is %x\n", txstat);
+
+                       /* If abort-condition, clear the abort */
+                       if (txstat & RL_TXSTAT_TXABRT) {
+                               CSR_WRITE_4(sc, RL_TXCFG, RL_TXCFG_CLRABRT);
+                       }
+               } else {
+                       ifp->if_opackets++;
+
                        /*
-                        * If there was a transmit underrun,
-                        * bump the TX threshold.
+                        * Update the RX threshold if underrun is reported.
+                        * Datasheet says threshold must be between 000001 and
+                        * 111111 inclusive.  We start at 000011 (96).
                         */
-                       if (txstat & RL_TXSTAT_TX_UNDERRUN)
-                               sc->rl_txthresh = oldthresh + 32;
-                       return;
+                       if ((txstat & RL_TXSTAT_TX_UNDERRUN) &&
+                           (sc->rl_txthresh < 2016)) {
+                               sc->rl_txthresh += 64;
+       printf("rl_txeof underrun: TSR is %x, txthresh now %d\n", txstat,
+           sc->rl_txthresh);
+                       }
+
+                       /* Update collision count, if any */
+                       ifp->if_collisions +=
+                           (txstat & RL_TXSTAT_COLLCNT) >> 24;
                }
                RL_INC(sc->rl_cdata.last_tx);
                ifp->if_flags &= ~IFF_OACTIVE;

>Release-Note:
>Audit-Trail:

From: Andrew Belashov <bel@orel.ru>
To: freebsd-gnats-submit@FreeBSD.org, tim@eudaemon.net,
	wpaul@FreeBSD.org
Cc:  
Subject: Re: kern/61448: [patch] RealTek driver rl(4) fails to adjust TX threshold
 on TX underrun
Date: Mon, 18 Oct 2004 15:37:07 +0400

 This is a multi-part message in MIME format.
 --------------000302050404030206090101
 Content-Type: text/plain; charset=us-ascii; format=flowed
 Content-Transfer-Encoding: 7bit
 
 I have same problem with cardbus PCMCIA card.
 
 Hardware: Fujitsu FMV-BIBLIO NU13 notebook (Pentium I, 133MHz);
 TRENDnet TE100-PCBUSR Cardbus Ethernet card based on RealTek 8139 chip.
 
 Network card is locked by transfer files through ftp or scp.
 Strange messages on console:
 
 ---[dmesg]---
 rl0: discard oversize frame (ether type 6269 flags 3 len 25696 > max 1514)
 rl0: discard oversize frame (ether type 3423 flags 3 len 12589 > max 1514)
 rl0: discard oversize frame (ether type 2f62 flags 3 len 25387 > max 1514)
 ---[dmesg]---
 
 I have added some debug code (thanks ktr(4)) into if_rl.c. For example
 see: <http://www.orel.ru/~bel/patches/if_rl_debug.patch>. This patch also
 resolve my problem.
 
 I offer adapted patch for -CURRENT. See attachment.
 
 With attached patch I do not have any error or problem on my slow notebook.
 Now FTP transfer rate from 1,6 MByte/s to 2,7 MByte/s in any direction.
 
 Note: I do not known how to test patched driver for error recovery from
 RL_TXSTAT_TXABRT or RL_TXSTAT_OUTOFWIN error state.
 
 --
 Best regards,
 Andrew Belashov.
 
 --------------000302050404030206090101
 Content-Type: text/plain;
  name="current.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="current.patch"
 
 --- sys/pci/if_rl.c.orig	Thu Sep  9 14:39:52 2004
 +++ sys/pci/if_rl.c	Sun Oct 17 21:49:44 2004
 @@ -1226,31 +1226,33 @@ rl_txeof(struct rl_softc *sc)
  		    RL_TXSTAT_TX_UNDERRUN|RL_TXSTAT_TXABRT)))
  			break;
  
 -		ifp->if_collisions += (txstat & RL_TXSTAT_COLLCNT) >> 24;
 -
  		bus_dmamap_unload(sc->rl_tag, RL_LAST_DMAMAP(sc));
  		bus_dmamap_destroy(sc->rl_tag, RL_LAST_DMAMAP(sc));
  		m_freem(RL_LAST_TXMBUF(sc));
  		RL_LAST_TXMBUF(sc) = NULL;
 -		if (txstat & RL_TXSTAT_TX_OK)
 -			ifp->if_opackets++;
 -		else {
 -			int			oldthresh;
 +		if (txstat & (RL_TXSTAT_OUTOFWIN | RL_TXSTAT_TXABRT)) {
  			ifp->if_oerrors++;
 -			if ((txstat & RL_TXSTAT_TXABRT) ||
 -			    (txstat & RL_TXSTAT_OUTOFWIN))
 -				CSR_WRITE_4(sc, RL_TXCFG, RL_TXCFG_CONFIG);
 -			oldthresh = sc->rl_txthresh;
 -			/* error recovery */
 -			rl_reset(sc);
 -			rl_init_locked(sc);
 +
 +			/* If abort-condition, clear the abort */
 +			if (txstat & RL_TXSTAT_TXABRT) {
 +		 		CSR_WRITE_4(sc, RL_TXCFG, RL_TXCFG_CLRABRT);
 +			}
 +		} else {
 +			ifp->if_opackets++;
 +
  			/*
 -			 * If there was a transmit underrun,
 -			 * bump the TX threshold.
 +			 * Update the RX threshold if underrun is reported.
 +			 * Datasheet says threshold must be between 000001 and
 +			 * 111111 inclusive.  We start at 000011 (96).
  			 */
 -			if (txstat & RL_TXSTAT_TX_UNDERRUN)
 -				sc->rl_txthresh = oldthresh + 32;
 -			return;
 +			if ((txstat & RL_TXSTAT_TX_UNDERRUN) &&
 +			    (sc->rl_txthresh < 2016)) {
 +				sc->rl_txthresh += 64;
 +			}
 +
 +			/* Update collision count, if any */
 +			ifp->if_collisions +=
 +			    (txstat & RL_TXSTAT_COLLCNT) >> 24;
  		}
  		RL_INC(sc->rl_cdata.last_tx);
  		ifp->if_flags &= ~IFF_OACTIVE;
 
 --------------000302050404030206090101--
Responsible-Changed-From-To: freebsd-bugs->mlaier 
Responsible-Changed-By: mlaier 
Responsible-Changed-When: Thu Feb 3 17:48:24 GMT 2005 
Responsible-Changed-Why:  
Take over while investigating ALTQ problems with rl(4). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=61448 
State-Changed-From-To: open->feedback 
State-Changed-By: mlaier 
State-Changed-When: Tue Feb 8 23:01:53 GMT 2005 
State-Changed-Why:  
According to the Realtec programming guide for the 8139 it is possible to 
have TOK=1 and TOK=0 when TUN=1 is set.  This denotes if the chip was able 
to re-transfer or not.  Removing the reset/init cycle would be nice, but 
I don't think it's a good idea as it might break older cards supported by 
this driver.  Same goes for increasing the threshold step. 

Please test the following, less intrusive patch and give feedback.  Thanks. 

Also available from: http://people.freebsd.org/~mlaier/if_rl.c.PR.patch 

diff -u -r1.146 if_rl.c 
--- if_rl.c	7 Jan 2005 02:29:18 -0000	1.146 
+++ if_rl.c	8 Feb 2005 22:46:10 -0000 
@@ -1232,6 +1232,14 @@ 
bus_dmamap_destroy(sc->rl_tag, RL_LAST_DMAMAP(sc)); 
m_freem(RL_LAST_TXMBUF(sc)); 
RL_LAST_TXMBUF(sc) = NULL; 
+		/* 
+		 * If there was a transmit underrun, bump the TX threshold. 
+		 * Make sure not to overflow the 63 * 32byte we can address 
+		 * with the 6 available bit. 
+		 */ 
+		if ((txstat & RL_TXSTAT_TX_UNDERRUN) && 
+		    (sc->rl_txthresh < 2016)) 
+			sc->rl_txthresh += 32; 
if (txstat & RL_TXSTAT_TX_OK) 
ifp->if_opackets++; 
else { 
@@ -1244,12 +1252,8 @@ 
/* error recovery */ 
rl_reset(sc); 
rl_init_locked(sc); 
-			/* 
-			 * If there was a transmit underrun, 
-			 * bump the TX threshold. 
-			 */ 
-			if (txstat & RL_TXSTAT_TX_UNDERRUN) 
-				sc->rl_txthresh = oldthresh + 32; 
+			/* restore original threshold */ 
+			sc->rl_txthresh = oldthresh; 
return; 
} 
RL_INC(sc->rl_cdata.last_tx); 

http://www.freebsd.org/cgi/query-pr.cgi?pr=61448 
State-Changed-From-To: feedback->patched 
State-Changed-By: mlaier 
State-Changed-When: Fri Feb 11 01:18:09 GMT 2005 
State-Changed-Why:  
Committed to HEAD, MFC due in 1 week already in order to catch 5.4 branch. 
Please test now! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=61448 
State-Changed-From-To: patched->closed 
State-Changed-By: mlaier 
State-Changed-When: Fri Feb 18 15:59:16 GMT 2005 
State-Changed-Why:  
The change hasa been MFCed to RELENG_5.  Thanks. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=61448 
>Unformatted:
