From nobody@FreeBSD.org  Mon Dec  1 22:28:05 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D3D401065673
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  1 Dec 2008 22:28:05 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id C6E948FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  1 Dec 2008 22:28:05 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id mB1MS5RD019984
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 1 Dec 2008 22:28:05 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id mB1MS5x4019983;
	Mon, 1 Dec 2008 22:28:05 GMT
	(envelope-from nobody)
Message-Id: <200812012228.mB1MS5x4019983@www.freebsd.org>
Date: Mon, 1 Dec 2008 22:28:05 GMT
From: Ping Mai <pingmai@yahoo.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: xl0 watchdog timeout
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         129352
>Category:       kern
>Synopsis:       [xl] [patch] xl0 watchdog timeout
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    yongari
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 01 22:30:00 UTC 2008
>Closed-Date:    
>Last-Modified:  Sat Sep 25 22:51:06 UTC 2010
>Originator:     Ping Mai
>Release:        RELENG_6
>Organization:
Steps Consulting
>Environment:
FreeBSD agra.pinelake.stepnet.com 6.4-PRERELEASE FreeBSD 6.4-PRERELEASE #4: Fri Nov 28 14:10:54 PST 2008     root@agra.pinelake.stepnet.com:/usr/src/sys/i386/compile/AGRA  i386

>Description:
Ever since upgrading from 5-STABLE to 6-STABLE, xl0 get watchdog timeouts and the NIC resets:
Sep 12 10:20:52 agra kernel: xl0: watchdog timeout
Sep 12 10:20:52 agra kernel: xl0: link state changed to DOWN
Sep 12 10:20:54 agra kernel: xl0: link state changed to UP
This happens 2-3 times a day.





>How-To-Repeat:
Dell Inspiron 8200 with builtin NIC
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec80-0xecff mem 0xf8fffc00-0xf8fffc7f irq 11 at device 0.0 on pci2

Running 6-STABLE
>Fix:
xl_txeof() and xl_txeof_90xB() restart the timer if some packets had been sent.
See patch file.


Patch attached with submission follows:

Index: /usr/src/sys/pci/if_xl.c
===================================================================
RCS file: /local/fbsdcvs/src/sys/pci/Attic/if_xl.c,v
retrieving revision 1.190.2.12
diff -c -r1.190.2.12 if_xl.c
*** /usr/src/sys/pci/if_xl.c	23 Apr 2008 21:38:29 -0000	1.190.2.12
--- /usr/src/sys/pci/if_xl.c	28 Nov 2008 22:10:36 -0000
***************
*** 2079,2084 ****
--- 2079,2085 ----
  {
  	struct xl_chain		*cur_tx;
  	struct ifnet		*ifp = sc->xl_ifp;
+ 	u_long			opkts = ifp->if_opackets;
  
  	XL_LOCK_ASSERT(sc);
  
***************
*** 2120,2125 ****
--- 2121,2128 ----
  				sc->xl_cdata.xl_tx_head->xl_phys);
  			CSR_WRITE_2(sc, XL_COMMAND, XL_CMD_DOWN_UNSTALL);
  		}
+ 		if (opkts != ifp->if_opackets)
+ 			sc->xl_wdog_timer = 5;
  	}
  }
  
***************
*** 2129,2134 ****
--- 2132,2138 ----
  	struct xl_chain		*cur_tx = NULL;
  	struct ifnet		*ifp = sc->xl_ifp;
  	int			idx;
+ 	u_long			opkts = ifp->if_opackets;
  
  	XL_LOCK_ASSERT(sc);
  
***************
*** 2158,2163 ****
--- 2162,2169 ----
  
  	if (sc->xl_cdata.xl_tx_cnt == 0)
  		sc->xl_wdog_timer = 0;
+ 	else if (opkts != ifp->if_opackets)
+ 			sc->xl_wdog_timer = 5;
  	sc->xl_cdata.xl_tx_cons = idx;
  
  	if (cur_tx != NULL)


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Dec 1 22:53:23 UTC 2008 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 

From: "Reko Turja" <reko.turja@liukuma.net>
To: <bug-followup@FreeBSD.org>,
	<pingmai@yahoo.com>
Cc:  
Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
Date: Tue, 4 Aug 2009 13:10:07 +0300

 After updating my firewall box to 7.2-STABLE I started getting the=20
 watchdog timeouts with the link state going down and returning back up=20
 after couple of seconds on the xl interface. Some seeking from gnats=20
 returned this bug report.
 
 I applied the patch succesfully on:
 
 src/sys/pci/if_xl.c,v 1.210.2.2 2008/04/23 21:28:29
 
 and will give it a shot for some days in order to see if it breaks=20
 something or if the watchdog timeouts still keep occurring.  So far=20
 rebooting after fresh kernel seems to be ok.
 =20
 

From: Reko Turja <ignatz@liukuma.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
Date: Sun, 9 Aug 2009 14:56:44 +0300 (EEST)

 After running the patch from this PR for some days I still got some 
 watchdog timeouts. As another approach, I'm trying the driver revision 1.4 from 
 /src/sys/dev/xl/ (8.x sourcetree) which compiled clean on my system with 
 sources updated as of today.
 
 My uname -a:
 
 FreeBSD xxx.org 7.2-STABLE FreeBSD 7.2-STABLE #10: Sun Aug  9 14:13:47 
 EEST 2009     root@xxx.org:/usr/obj/usr/src/sys/MORIA  i386
 
 The reason for trying 1.4 was the commit message:
 
 SVN rev 191345 on 2009-04-21 00:42:11Z by yongari
 
 To make it easy whether xl(4) missed Tx completion interrupt check number 
 of queued packets in watchdog timeout handler. If there are no queued 
 packets just print a informational message and return without resetting 
 controller. Also fix to invoke correct Tx completion handler as 3C905B 
 needs different handler.
 
 I will send a followup after testing the driver for a while - if it seems 
 to work, is there any chance of backporting it for 7.x?
 
 -Reko
Responsible-Changed-From-To: freebsd-net->yongari 
Responsible-Changed-By: andre 
Responsible-Changed-When: Mon Aug 23 17:57:17 UTC 2010 
Responsible-Changed-Why:  
Over to expert. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 
State-Changed-From-To: open->feedback 
State-Changed-By: yongari 
State-Changed-When: Tue Sep 21 18:43:33 UTC 2010 
State-Changed-Why:  
Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE? 
To original submitter: 
I'm under the impression that the patch just disables watchdog 
timeout detection logic of driver. What we need to know here is 
why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 

From: Pyun YongHyeon <pyunyh@gmail.com>
To: Ping Mai <pingmai@yahoo.com>
Cc: yongari@FreeBSD.org, bug-followup@FreeBSD.org
Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
Date: Wed, 22 Sep 2010 17:26:04 -0700

 On Wed, Sep 22, 2010 at 04:29:20PM -0700, Ping Mai wrote:
 > i'm traveling in south america at the moment and do not have easy access.
 > just by looking at the code snippet in the PR, and from what i can remember, the
 > problem was that the xl0 would reset itself voluntarily frequently.
 > 
 > the few lines that i've added, sets the watchdog timeout to 5 if any packet had 
 > been sent.
 
 The watchdog keeps track of the time of the latest packet's
 transmission attempt. So when you send more packets while watchdog
 time is active, the watchdog time is updated whenever transmission
 is attempted.
 
 Watchdog timeout should be set in xl_start, not in reclaiming
 routine like xl_txeof because xl_start is the only place that kicks
 controller to send queued frame. If you see watchdog timeouts this
 means the frame queued in xl_start was not sent within timeout
 period so adjusting timeout(except unarming it when there are no
 pending frames) in xl_txeof is a bug.
 
 > this does not disable the watchdog but merely restarts the count down.? i 
 > wouldn't disable that
 > watchdog because some xl0 is prone to freeze up.
 > 
 > on my Dell laptop, this patch did reduce the frequency of the resets, which were 
 > very annoying
 > in that in made the system loose all its connections.
 > 
 > i believe the real problem and fix lies elsewhere.? i would look at the 
 > interrupt handling logic
 > introduced around that time, and the peculiarity of the xl.
 > 
 
 To write a real fix I need to know why and when it happens. Recent
 FreeBSD releases include a code that checks whether watchdog
 timeout of xl(4) was caused by missing Tx completion interrupts. If
 this was the case, xl(4) just shows the informational message but
 do not reinitialize controller. So it would be better if you can
 test more recent FreeBSD releases(8.1-RELEASE or 7.3-RELEASE) and
 let me know how it makes any difference.
 Another thing I'd like to know is your "pciconf -lcbv" output to
 narrow down exact controller revision. If you can easily trigger
 the issue please let me know how you did trigger the issue.
 
 Thanks.
 
 > 
 > 
 > ----- Original Message ----
 > From: "yongari@FreeBSD.org" <yongari@FreeBSD.org>
 > To: pingmai@yahoo.com; yongari@FreeBSD.org; yongari@FreeBSD.org
 > Sent: Tue, September 21, 2010 3:44:55 PM
 > Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
 > 
 > Synopsis: [xl] [patch] xl0 watchdog timeout
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: yongari
 > State-Changed-When: Tue Sep 21 18:43:33 UTC 2010
 > State-Changed-Why: 
 > Is it still issue on FreeBSD 8.1-RELEASE or 7.3-RELEASE?
 > To original submitter:
 > I'm under the impression that the patch just disables watchdog
 > timeout detection logic of driver. What we need to know here is
 > why it triggers watchdog timeouts(e.g. driver bug, silicon bug etc).
 > 
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=129352
 > 
 > 
 > 
 > 

From: Pyun YongHyeon <pyunyh@gmail.com>
To: Ping Mai <pingmai@yahoo.com>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/129352: [xl] [patch] xl0 watchdog timeout
Date: Sat, 25 Sep 2010 15:45:55 -0700

 On Sat, Sep 25, 2010 at 05:22:00AM -0700, Ping Mai wrote:
 > I remeber the same hardware did not have the xl reset problem until i upgraded 
 > to that
 > particular release.? that's why i thought it was related to the interrupt 
 
 I also vaguely remember xl(4) watchdog timeout issues in 6.x days.
 That's reason why I asked whether you still see the issue on recent
 FreeBSD releases.
 
 > handling layer.
 > at the time i've heard others having this reset problem with the xl and it was 
 > not
 > limited to xl chip.? i knew it was not the correct fix but it did reduce those 
 > annoying
 > resets by 95%.? i will be down in south america until december or january.? but 
 > i will certainly take a look whenever access permits.? thanks.
 > 
 
 Ok, if you find some spare time in future let me know. Let's fix
 it.
State-Changed-From-To: feedback->open 
State-Changed-By: yongari 
State-Changed-When: Sat Sep 25 22:50:42 UTC 2010 
State-Changed-Why:  
Feddback received. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 
>Unformatted:
