From nobody@FreeBSD.org  Thu Sep 24 15:53:47 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 47EF31065679
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 24 Sep 2009 15:53:47 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 378D88FC0A
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 24 Sep 2009 15:53:47 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n8OFrkAo016110
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 24 Sep 2009 15:53:46 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n8OFrkS8016097;
	Thu, 24 Sep 2009 15:53:46 GMT
	(envelope-from nobody)
Message-Id: <200909241553.n8OFrkS8016097@www.freebsd.org>
Date: Thu, 24 Sep 2009 15:53:46 GMT
From: David Horn <dhorn2000@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [lagg] + wlan boot timing (EBUSY)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         139117
>Category:       kern
>Synopsis:       [lagg] + wlan boot timing (EBUSY)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-net
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 24 16:00:11 UTC 2009
>Closed-Date:    
>Last-Modified:  Sat Mar 20 01:00:13 UTC 2010
>Originator:     David Horn
>Release:        8.0 RC1
>Organization:
>Environment:
FreeBSD lagg 8.0-RC1 FreeBSD 8.0-RC1 #11 r197417: Wed Sep 23 01:05:15
EDT 2009     root@lagg:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
I have been trying to track down a problem with my lagg connection
sometimes not properly enabling wlan as fallback on boot.  It would
work properly about 60% of the time.  The other times, it would fail
with SIOCSLAGGPORT: Device busy

Here is the relevant rc.conf entries:

ifconfig_bfe0="up"
wlans_iwn0="wlan0"
ifconfig_wlan0="WPA"
ifconfig_iwn0="ether 00:1c:23:98:2c:5d"
cloned_interfaces="lagg0"
ipv6_network_interfaces="lagg0"
ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP"
ipv6_enable="YES"

So, I turned on some logging of all ifconfig commands with timestamps
and stdout/stderr/returncode, and noticed this:

Wed Sep 23 01:39:56 EDT 2009 ifconfig: lagg0 create ;
;; Wed Sep 23 01:39:56 EDT 2009 lagg0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: -l ;
iwn0 bfe0 fwe0 fwip0 lo0 lagg0
;; Wed Sep 23 01:39:56 EDT 2009 -l rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: -l ;
iwn0 bfe0 fwe0 fwip0 lo0 lagg0
;; Wed Sep 23 01:39:56 EDT 2009 -l rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: lo0 inet 127.0.0.1 ;
;; Wed Sep 23 01:39:56 EDT 2009 lo0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: lo0 up ;
;; Wed Sep 23 01:39:56 EDT 2009 lo0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: iwn0 ether 00:1c:23:98:2c:5d ;
;; Wed Sep 23 01:39:56 EDT 2009 iwn0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: iwn0 up ;
;; Wed Sep 23 01:39:56 EDT 2009 iwn0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: wlan0 create wlandev iwn0 ;
;; Wed Sep 23 01:39:56 EDT 2009 wlan0 rc='0' end.
Wed Sep 23 01:39:56 EDT 2009 ifconfig: wlan0 ;
wlan0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
       ether 00:1c:23:98:2c:5d
       media: IEEE 802.11 Wireless Ethernet autoselect (autoselect)
       status: no carrier
       ssid "" channel 1 (2412 Mhz 11b)
       country US authmode OPEN privacy OFF txpower 14 bmiss 10 scanvalid 60
       wme bintval 0
;; Wed Sep 23 01:39:56 EDT 2009 wlan0 rc='0' end.
Wed Sep 23 01:39:57 EDT 2009 ifconfig: lagg0 laggproto failover
laggport bfe0 laggport wlan0 ;
ifconfig.real: SIOCSLAGGPORT: Device busy
;; Wed Sep 23 01:39:57 EDT 2009 lagg0 rc='1' end.

So, I started looking at the /sys/net/if_lagg.c source, and found the
EBUSY response cases:

This one

/* New lagg port has to be in an idle state */
       if (ifp->if_drv_flags & IFF_DRV_OACTIVE)
               return (EBUSY);

seems to be the culprit, but unfortunately, I'm not familiar enough
with the code to take this much further.  I did build a kernel without
this check, and everything seems to be fixed, but this is obviously
not a real fix to the problem.  So, I would say the fact that
wpa_supplicant is talking to wlan0 (trying to scan/associate/auth)
while lagg is trying to add wlan0 to the portlist is the timing issue.

I confirmed this behavior as follows:

ifconfig wlan0 destroy
ifconfig lagg0 destroy
ifconfig lagg0 create
ifconfig wlan0 create wlandev iwn0  & ; ifconfig lagg0 laggproto
failover laggport bfe0 laggport wlan0
results in:
ifconfig: SIOCSLAGGPORT: Device busy

Someone more clueful than me know of a correct way to fix this
contention issue ?


It is possible that the following reports are related:

http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008776.html
http://lists.freebsd.org/pipermail/freebsd-current/2009-June/007641.html
>How-To-Repeat:
After playing around with this issue, it is 100% reproducible at boot time (at least for me) when you create a lagg group with more than 1 wlan interface. e.g.:

ifconfig_bfe0="up"
wlans_iwn0="wlan0"
wlans_ural0="wlan1"
wlans_ath0="wlan2"
ifconfig_wlan0="WPA"
ifconfig_wlan1="WPA"
ifconfig_wlan2="WPA"
ifconfig_iwn0="ether 00:1c:23:98:2c:5d"
ifconfig_ural0="ether 00:1c:23:98:2c:5d"
ifconfig_ath0="ether 00:1c:23:98:2c:5d"
cloned_interfaces="lagg0"
ipv6_network_interfaces="lagg0"
ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 laggport wlan1 laggport wlan2 DHCP"
ipv6_enable="YES"


>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Sep 24 20:30:23 UTC 2009 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=139117 

From: David Horn <dhorn2000@gmail.com>
To: bug-followup@freebsd.org
Cc:  
Subject: Re: kern/139117: [lagg] + wlan boot timing (EBUSY)
Date: Wed, 2 Dec 2009 20:58:06 -0500

 Just for anyone else running into this problem, a "hack" workaround is
 to force the raw wireless interface (wlan0) and the raw wired
 interface (bfe0 in my case)  to not accept IPv6 router advertisements,
 and just allow the lagg0 interface to accept them.  This causes the
 timing window for OACTIVE (and thus the EBUSY response) to be MUCH
 smaller, and now I only very rarely see this issue.
 
 Example 8.0 rc.conf hack to use ndp to disable accepting IPv6 rtadv:
 
 ifconfig_bfe0="ether 00:1c:23:98:2c:5d `ndp -i bfe0 nud -accept_rtadv
 >/dev/null 2>&1`"
 ifconfig_wlan0="WPA `ndp -i wlan0 nud -accept_rtadv >/dev/null 2>&1`"
 ifconfig_iwn0="ether 00:1c:23:98:2c:5d"
 cloned_interfaces="lagg0"
 ipv6_network_interfaces="lagg0"
 ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP"
 ipv6_enable="YES"
 
 Note, you will need to ignore any ndp error messages out of rc.conf at
 boot time, as this will cause spurious (but innocuous) console error
 messages about ndp.
 
 Example 9.0 rc.conf settings:
 
 ifconfig_iwn0="ether 00:1c:23:98:2c:5d"
 wlans_iwn0="wlan0"
 ifconfig_bfe0="UP"
 ifconfig_bfe0_ipv6="inet6 ifdisabled -nud -auto_linklocal"
 ifconfig_wlan0="WPA"
 ifconfig_wlan0_ipv6="inet6 ifdisabled -nud -auto_linklocal"
 cloned_interfaces="lagg0"
 ifconfig_lagg0="laggproto failover laggport bfe0 laggport wlan0 DHCP"
 ifconfig_lagg0_ipv6="inet6 accept_rtadv"
 ipv6_prefer="YES"
 
 This was thanks to sam@ for pointing me away from wpa_supplicant and
 towards IPv6 as a likely culprit for causing the OACTIVE flag to be
 set on the interface causing the ifconfig lagg0 call to not add the
 wireless interface.
 
 Of course in 9.0 (-current as of the moment), there is the new
 facility to disable IPv6 router advertisements (and other ipv6 flags)
 via ifconfig so this hack is not needed there.  The contention/timing
 issue still exists, but since users likely to NOT want the non-lagg
 interfaces to get IPv6 addresses from a router advertisement message
 anyway, the work around makes this problem less severe.
 
 The 9.0 example here is of course subject to change...
 
 ---Dave Horn

From: David Horn <dhorn2000@gmail.com>
To: bug-followup@FreeBSD.org, dhorn2000@gmail.com
Cc:  
Subject: Re: kern/139117: [lagg] wlan boot timing (EBUSY)
Date: Fri, 19 Mar 2010 20:59:19 -0400

 --001485f7d62453ff1d048230f91d
 Content-Type: text/plain; charset=ISO-8859-1
 
 Fixed by r204901 - delphij removal of OACTIVE check (also attached as
 a text diff)
 
 After testing this several different ways, this seems to work 100% for me.
 Please close as fixed after MFC.
 
 --Thanks!
 
 --Dave Horn
 
 --001485f7d62453ff1d048230f91d
 Content-Type: text/plain; charset=US-ASCII; name="lagg_oactive.diff.txt"
 Content-Disposition: attachment; filename="lagg_oactive.diff.txt"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_g6zpiz4i0
 
 SW5kZXg6IGlmX2xhZ2cuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
 PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBpZl9sYWdnLmMJKHJldmlzaW9uIDIwNDg3
 NykKKysrIGlmX2xhZ2cuYwkod29ya2luZyBjb3B5KQpAQCAtNDg0LDEwICs0ODQsNiBAQAogCWlm
 IChzYy0+c2NfY291bnQgPj0gTEFHR19NQVhfUE9SVFMpCiAJCXJldHVybiAoRU5PU1BDKTsKIAot
 CS8qIE5ldyBsYWdnIHBvcnQgaGFzIHRvIGJlIGluIGFuIGlkbGUgc3RhdGUgKi8KLQlpZiAoaWZw
 LT5pZl9kcnZfZmxhZ3MgJiBJRkZfRFJWX09BQ1RJVkUpCi0JCXJldHVybiAoRUJVU1kpOwotCiAJ
 LyogQ2hlY2sgaWYgcG9ydCBoYXMgYWxyZWFkeSBiZWVuIGFzc29jaWF0ZWQgdG8gYSBsYWdnICov
 CiAJaWYgKGlmcC0+aWZfbGFnZyAhPSBOVUxMKQogCQlyZXR1cm4gKEVCVVNZKTsK
 --001485f7d62453ff1d048230f91d--
>Unformatted:
