From Tor.Egge@broadpark.no  Tue Jun  3 02:02:47 2008
Return-Path: <Tor.Egge@broadpark.no>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7661F1065686
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  3 Jun 2008 02:02:47 +0000 (UTC)
	(envelope-from Tor.Egge@broadpark.no)
Received: from osl1smout1.broadpark.no (osl1smout1.broadpark.no [80.202.4.58])
	by mx1.freebsd.org (Postfix) with ESMTP id 308DC8FC33
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  3 Jun 2008 02:02:47 +0000 (UTC)
	(envelope-from Tor.Egge@broadpark.no)
Received: from osl1sminn1.broadpark.no ([80.202.4.59])
 by osl1smout1.broadpark.no
 (Sun Java(tm) System Messaging Server 6.3-3.01 (built Jul 12 2007; 32bit))
 with ESMTP id <0K1V00ETS48LORC0@osl1smout1.broadpark.no> for
 FreeBSD-gnats-submit@freebsd.org; Tue, 03 Jun 2008 03:02:45 +0200 (CEST)
Received: from tegge-laptop.trondheim.corp.yahoo.com ([84.48.203.244])
 by osl1sminn1.broadpark.no
 (Sun Java(tm) System Messaging Server 6.3-3.01 (built Jul 12 2007; 32bit))
 with ESMTP id <0K1V00FAI48KFKT3@osl1sminn1.broadpark.no> for
 FreeBSD-gnats-submit@freebsd.org; Tue, 03 Jun 2008 03:02:45 +0200 (CEST)
Received: from tegge-laptop.trondheim.corp.yahoo.com (localhost [127.0.0.1])
	by tegge-laptop.trondheim.corp.yahoo.com (8.14.2/8.14.2)
 with ESMTP id m5312hlG001802	for <FreeBSD-gnats-submit@freebsd.org>; Tue,
 03 Jun 2008 03:02:43 +0200
Received: (from tegge@localhost)	by tegge-laptop.trondheim.corp.yahoo.com
 (8.14.2/8.14.2/Submit) id m5312g4O001801; Tue,
 03 Jun 2008 03:02:42 +0200 (CEST envelope-from tegge)
Message-Id: <200806030102.m5312g4O001801@tegge-laptop.trondheim.corp.yahoo.com>
Date: Tue, 03 Jun 2008 03:02:42 +0200 (CEST)
From: Tor Egge <Tor.Egge@broadpark.no>
Reply-To: Tor Egge <Tor.Egge@broadpark.no>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: ndis network driver sometimes loses network connection
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         124225
>Category:       kern
>Synopsis:       [ndis] [patch] ndis network driver sometimes loses network connection
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-net
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jun 03 02:10:02 UTC 2008
>Closed-Date:    
>Last-Modified:  Sat Mar 20 02:38:26 UTC 2010
>Originator:     Tor Egge
>Release:        FreeBSD 8.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD tegge-laptop.trondheim.corp.yahoo.com 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sun Jun 1 00:42:26 CEST 2008 root@tegge-laptop.trondheim.corp.yahoo.com:/usr/src/sys/i386/compile/TEGGE_LAPTOP i386

>Description:

Normally, when packets are queued to the ndis network interface, ndis_start()
is called to move packets from the interface send queue to the underlying
NDIS driver.

If the network link is down or the underlying driver is busy transmitting data,
ndis_start() just returns.

When the link goes up, ndis_starttask() is supposed to be called after
ndis_ticktask() in order to transmit already queued packets.

After a watchdog timeout, ndis_starttask() is likewise supposed to be called
after ndis_resettask().

Unfortunately, work items used for triggering calls to ndis_ticktask(),
ndis_starttask() and ndis_resettask() are placed on separarate task lists which
are handled by separate kernel processes, thus losing ordering information
about when the tasks should be performed in relation to each other.

If the interface send queue is full after a watchdog timeout or link up event
and the tasks were handled in the wrong order then further attempts to send
packets via the interface results in ENOBUFS ("No buffer space available").

>How-To-Repeat:

Use the ndis driver for a wireless network card in an area with many APs on
nearby channels and on a machine with many active tcp connections, causing link
to temporarily go down every few hours, and the interface send queue to be
filled while the link is temporarily down.

>Fix:

A proper fix is to ensure that related tasks are handled in the correct order.

The following kludge justs add extra attempts at scheduling calls to
ndis_starttask() as part of the processing of ndis_ticktask() and
ndis_resettask().  It depends on defensive coding in IoQueueWorkItem(),
i.e. that nothing is done if the work item is already queued.

Index: sys/dev/if_ndis/if_ndis.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/if_ndis/if_ndis.c,v
retrieving revision 1.140
diff -u -r1.140 if_ndis.c
--- sys/dev/if_ndis/if_ndis.c	30 May 2008 07:17:51 -0000	1.140
+++ sys/dev/if_ndis/if_ndis.c	31 May 2008 21:24:14 -0000
@@ -1617,6 +1617,7 @@
 		IoQueueWorkItem(sc->ndis_tickitem, 
 		    (io_workitem_func)ndis_ticktask_wrap,
 		    WORKQUEUE_CRITICAL, sc);
+		/* XXX: startitem might be handled before tickitem */
 		IoQueueWorkItem(sc->ndis_startitem,
 		    (io_workitem_func)ndis_starttask_wrap,
 		    WORKQUEUE_CRITICAL, ifp);
@@ -1699,6 +1700,11 @@
 		}
 		NDIS_LOCK(sc);
 		if_link_state_change(sc->ifp, LINK_STATE_UP);
+		/* XXX: Start kludge */
+		IoQueueWorkItem(sc->ndis_startitem,
+		    (io_workitem_func)ndis_starttask_wrap,
+		    WORKQUEUE_CRITICAL, sc->ifp);
+		/* XXX: End kludge */
 	}
 
 	if (sc->ndis_link == 1 &&
@@ -3112,6 +3118,11 @@
 
 	sc = arg;
 	ndis_reset_nic(sc);
+	/* XXX: Start kludge */
+	IoQueueWorkItem(sc->ndis_startitem,
+	    (io_workitem_func)ndis_starttask_wrap,
+	    WORKQUEUE_CRITICAL, sc->ifp);
+	/* XXX: End kludge */
 	return;
 }
 
@@ -3131,6 +3142,7 @@
 	IoQueueWorkItem(sc->ndis_resetitem,
 	    (io_workitem_func)ndis_resettask_wrap,
 	    WORKQUEUE_CRITICAL, sc);
+	/* XXX: startitem might be handled before resetitem */
 	IoQueueWorkItem(sc->ndis_startitem,
 	    (io_workitem_func)ndis_starttask_wrap,
 	    WORKQUEUE_CRITICAL, ifp);
 
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Jun 3 02:46:09 UTC 2008 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 
Responsible-Changed-From-To: freebsd-net->cokane 
Responsible-Changed-By: cokane 
Responsible-Changed-When: Wed Jul 2 14:56:51 UTC 2008 
Responsible-Changed-Why:  
PR refers to a recent commit of changes that I made, I will look 
into solving this problem in my development branch. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 
Responsible-Changed-From-To: cokane->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Mar 20 02:37:53 UTC 2010 
Responsible-Changed-Why:  
returned to the pool by request (some time ago.) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 
>Unformatted:
I was the last one with my hand in this jar. I'll look into it and
see what I can do.

