From nobody@FreeBSD.org  Fri May  3 12:58:20 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	by hub.freebsd.org (Postfix) with ESMTP id 78B316D1
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  3 May 2013 12:58:20 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [69.147.83.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B2901F07
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  3 May 2013 12:58:20 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id r43CwKKN023534
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 3 May 2013 12:58:20 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id r43CwK4x023533;
	Fri, 3 May 2013 12:58:20 GMT
	(envelope-from nobody)
Message-Id: <201305031258.r43CwK4x023533@red.freebsd.org>
Date: Fri, 3 May 2013 12:58:20 GMT
From: Luiz Otavio O Souza <loos.br@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [patch] [arge] if_arge/bootp race under some circunstances
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         178318
>Category:       kern
>Synopsis:       [patch] [arge] if_arge/bootp race under some circunstances
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    loos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 03 13:00:01 UTC 2013
>Closed-Date:    Thu Aug 29 13:33:30 UTC 2013
>Last-Modified:  Thu Aug 29 13:33:30 UTC 2013
>Originator:     Luiz Otavio O Souza
>Release:        -head r250121
>Organization:
>Environment:
FreeBSD rb433 10.0-CURRENT FreeBSD 10.0-CURRENT #61 r250121M: Fri May  3 09:45:51 BRT 2013     root@devel:/data/rb/rb433/obj/mips.mips/data/rb/rb433/src/sys/RSPRO  mips
>Description:
I'd discovered (by the hard way :) that adding some debug on
arge_init_locked() (like the example bellow) will cause bootp to fail.

Index: mips/atheros/if_arge.c
===================================================================
--- mips/atheros/if_arge.c      (revision 250121)
+++ mips/atheros/if_arge.c      (working copy)
@@ -1006,6 +1006,7 @@
 
        ARGE_LOCK_ASSERT(sc);
 
+printf("%s: called\n", __func__);
        arge_stop(sc);
 
        /* Init circular RX list. */


Bootp will loop for a while with the timeout message until the kernel panics:


arge0: link state changed to UP
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
panic: EFBIG
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x4c: lui     at,0x8059
db> 


After confirm that it really was the printf() that causes the problem
i started to look why arge_init() was being called twice between the
timeouts and why it was making bootp timeout and fail to boot.

A few things contribute for this race to occur, first arge_init() forces
a full stop->start cicle every time it is called, so with the following
debug we can understand what happens:

bootpc_call: set netmask 0.0.0.0
arge_init_locked: called
bootpc_call: sosend()
bootpc_call: set netmask 255.0.0.0
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: set netmask 0.0.0.0
arge_init_locked: called
bootpc_call: sosend()
bootpc_call: set netmask 255.0.0.0
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: set netmask 0.0.0.0


If arge_init() isn't fast enough while resetting the driver on the second
netmask change it will miss the bootp response packet.


>How-To-Repeat:
Add something like this to arge_init_locked():


Index: mips/atheros/if_arge.c
===================================================================
--- mips/atheros/if_arge.c      (revision 250121)
+++ mips/atheros/if_arge.c      (working copy)
@@ -1006,6 +1006,7 @@
 
        ARGE_LOCK_ASSERT(sc);
 
+printf("%s: called\n", __func__);
        arge_stop(sc);
 
        /* Init circular RX list. */


Add the following to RSPRO kernel:


Index: sys/mips/conf/RSPRO
===================================================================
--- sys/mips/conf/RSPRO (revision 250121)
+++ sys/mips/conf/RSPRO (working copy)
@@ -28,3 +28,12 @@
 # Boot off of flash
 options                ROOTDEVNAME=\"ufs:redboot/rootfs.uzip\"
 
+options                NFSCL
+options                NFS_ROOT
+options                BOOTP
+options                BOOTP_NFSROOT
+options                BOOTP_NFSV3
+options                BOOTP_WIRED_TO=arge0
+options                BOOTP_COMPAT
+
+


And try boot from bootp.
>Fix:
The fix is based on simply refuse to proceed with the driver restart if
the driver is already 'up' and 'running'. There is no need to restart
the driver on each time we change or add an IP address or netmask.

Then, if we just proceed when the driver is stopped we don't need to force
the stop->start cicle anymore.

The leakage that leads to the panic will be fixed in a subsequent PR.

Patch attached with submission follows:

Index: sys/mips/atheros/if_arge.c
===================================================================
--- sys/mips/atheros/if_arge.c	(revision 250121)
+++ sys/mips/atheros/if_arge.c	(working copy)
@@ -1006,7 +1006,8 @@
 
 	ARGE_LOCK_ASSERT(sc);
 
-	arge_stop(sc);
+	if ((ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING))
+		return;
 
 	/* Init circular RX list. */
 	if (arge_rx_ring_init(sc) != 0) {


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-mips 
Responsible-Changed-By: adrian 
Responsible-Changed-When: Thu May 16 03:23:13 UTC 2013 
Responsible-Changed-Why:  
assign to owner 


http://www.freebsd.org/cgi/query-pr.cgi?pr=178318 
Responsible-Changed-From-To: freebsd-mips->loos 
Responsible-Changed-By: loos 
Responsible-Changed-When: Wed Jul 31 13:45:03 UTC 2013 
Responsible-Changed-Why:  
Take it 

http://www.freebsd.org/cgi/query-pr.cgi?pr=178318 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/178318: commit references a PR
Date: Thu, 29 Aug 2013 12:48:29 +0000 (UTC)

 Author: loos
 Date: Thu Aug 29 12:48:12 2013
 New Revision: 255021
 URL: http://svnweb.freebsd.org/changeset/base/255021
 
 Log:
   Prevent the full restart cycle every time arge_start() is called.  Only
   (re)start the interface when it is down.  This change fix a race with
   BOOTP where the response packet is lost because the interface is being
   reset by a netmask change right after send the packet.
   
   PR:		178318
   Approved by:	adrian (mentor)
 
 Modified:
   head/sys/mips/atheros/if_arge.c
 
 Modified: head/sys/mips/atheros/if_arge.c
 ==============================================================================
 --- head/sys/mips/atheros/if_arge.c	Thu Aug 29 12:25:12 2013	(r255020)
 +++ head/sys/mips/atheros/if_arge.c	Thu Aug 29 12:48:12 2013	(r255021)
 @@ -1019,7 +1019,8 @@ arge_init_locked(struct arge_softc *sc)
  
  	ARGE_LOCK_ASSERT(sc);
  
 -	arge_stop(sc);
 +	if ((ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING))
 +		return;
  
  	/* Init circular RX list. */
  	if (arge_rx_ring_init(sc) != 0) {
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: loos 
State-Changed-When: Thu Aug 29 13:33:30 UTC 2013 
State-Changed-Why:  
Committed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=178318 
>Unformatted:
