From nobody@FreeBSD.org  Sun Feb 12 23:34:22 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 25A3A106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 12 Feb 2012 23:34:22 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 061EE8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 12 Feb 2012 23:34:22 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1CNYLI0064925
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 12 Feb 2012 23:34:21 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q1CNYLcM064924;
	Sun, 12 Feb 2012 23:34:21 GMT
	(envelope-from nobody)
Message-Id: <201202122334.q1CNYLcM064924@red.freebsd.org>
Date: Sun, 12 Feb 2012 23:34:21 GMT
From: Adrian Chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ath] vap->iv_bss race conditions causing crashes inside ath_beacon_alloc and similar
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         165060
>Category:       kern
>Synopsis:       [ath] vap->iv_bss race conditions causing crashes inside ath_beacon_alloc and similar
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Feb 12 23:40:05 UTC 2012
>Closed-Date:    
>Last-Modified:  Mon Feb 13 00:30:10 UTC 2012
>Originator:     Adrian Chadd
>Release:        9.0-RELEASE, running -HEAD ath/net80211
>Organization:
>Environment:
>Description:
There are a variety of crashes inside the ath driver which can be traced down to races between iv->iv_bss modify/reallocate/free.

From an email I sent to freebsd-wireless:

I've noticed some kernel panics in net80211/ath in -HEAD. It in all instances boils down to a now-invalid ieee80211_node - either it's partially allocated/copied, or it's been recently freed.



This became increasingly obvious when doing DFS CAC, as the kernel was now changing the channel quite frequently on me whilst simulating/processing radar events. I've since found I can mostly reproduce it in the lab (when surrounded by ridiculous levels of RX intereference traffic, triggering all kinds of events) whilst creating/destroying VAPs.

Now that I have debugging code in place (which as a side effect makes it very difficult now to cause a crash, let alone tickle the race condition) it's glaringly obvious what's going on.

There's five contexts stuff can occur, at least in the net80211/ath case:

* the swi (ie ath_intr(), ath_beacon_proc)
* the ath taskqueue;
* the net80211 taskqueue;
* the ioctl() context, coming up from a userland process;
* a callout running in the clock thread.

Now, callouts should _hopefully_ be grabbing and releasing locks correctly. We've found a few spots where they weren't (leading to quite silly state races and crashes.)

I'm going to ignore the obvious possible problems with multiple concurrent processes doing ioctl()s. l'm simply going to operate on the principle that the multiple-ioctl() path is fine.

It seems that -obtaining- references to vap->iv_bss aren't locked. So in (say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed. If this is going on concurrently with (say) something going on in the net80211 taskqueue (eg a newstate call) then I _think_ it's possible for the ath_newstate() code to get a reference to vap->iv_bss simultaneously with it being freed in ieee80211_sta_join1() (or similar.) So the ath_newstate() code will be assigned a 'ni' that has just been freed.

I've seen another crash in the net80211_ht code where it _looks_ like the bss node wasn't entirely setup - bsschan was 0xffff - so the kernel paniced hard there.

This likely explains a lot of the "weird stuff" people have been reporting. I also think the bgscan race is related to this - I can't help but wonder if the bgscan callout/event is also coinciding with wpa_supplicant doing stuff, and a race condition ends up leaving the vap w/ the sta power save flag set.

I don't yet have a solution to all of this - I just wanted to brain dump what I've seen thus far.
>How-To-Repeat:
It's unfortunately not easy to reproduce in a clean environment. It seemed very easy to reproduce in a radio-noisy environment where the RX handler is constantly being scheduled.
>Fix:
Someone pointed this out:

Here is my brain dump.

While ago usb wifi drivers had the slimier issue (race in 80211
stack). It's worth checking this rev.
http://svnweb.freebsd.org/base?view=revision&revision=212127

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: adrian 
Responsible-Changed-When: Mon Feb 13 00:14:06 UTC 2012 
Responsible-Changed-Why:  
Assign 

http://www.freebsd.org/cgi/query-pr.cgi?pr=165060 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/165060: commit references a PR
Date: Mon, 13 Feb 2012 00:28:55 +0000 (UTC)

 Author: adrian
 Date: Mon Feb 13 00:28:41 2012
 New Revision: 231571
 URL: http://svn.freebsd.org/changeset/base/231571
 
 Log:
   Attempt to address some potential vap->iv_bss race conditions.
   
   There are unfortunately a number of situations where vap->iv_bss is changed
   or freed by some code in net80211.  Because multiple threads can concurrently
   be doing work (and the vap->iv_bss access isn't at all done behind any kind
   of lock), it's quite possible that:
   
   * a change will occur in one thread - eg, by a call through
     ieee80211_sta_join1();
   * a state change occurs in another thread - eg an RX is scheduled
     in the ath tasklet and it calls ieee80211_input_mimo_all(), which
     does dereference vap->iv_bss;
   * these two executing concurrently, causing things to explode.
   
   Another instance is ath_beacon_alloc() which takes an ieee80211_node *.
   It's called with the vap->iv_bss node from ath_newstate(). If the node has
   changed in the meantime (say it's been freed elsewhere) the reference
   that it grabbed _before_ refcounting it may be stale.
   
   I would _prefer_ that these sorts of things were serialised somewhere but
   that may be a bit much to ask.  Instead, the best we can (currently) hope
   is that the underlying bss node is still (somewhat) valid.
   
   There is a related PR (kern/164382) described by the first case above.
   That should be fixed by properly serialising the RX path and reset path
   so an RX can't occur at the same time as the vap free/shutdown path.
   
   This is inspired by some related fixes in r212127.
   
   PR: kern/165060
 
 Modified:
   head/sys/dev/ath/if_ath.c
 
 Modified: head/sys/dev/ath/if_ath.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath.c	Sun Feb 12 23:48:39 2012	(r231570)
 +++ head/sys/dev/ath/if_ath.c	Mon Feb 13 00:28:41 2012	(r231571)
 @@ -1669,6 +1669,7 @@ ath_bmiss_vap(struct ieee80211vap *vap)
  		struct ath_softc *sc = ifp->if_softc;
  		u_int64_t lastrx = sc->sc_lastrx;
  		u_int64_t tsf = ath_hal_gettsf64(sc->sc_ah);
 +		/* XXX should take a locked ref to iv_bss */
  		u_int bmisstimeout =
  			vap->iv_bmissthreshold * vap->iv_bss->ni_intval * 1024;
  
 @@ -3245,7 +3246,7 @@ ath_beacon_config(struct ath_softc *sc, 
  
  	if (vap == NULL)
  		vap = TAILQ_FIRST(&ic->ic_vaps);	/* XXX */
 -	ni = vap->iv_bss;
 +	ni = ieee80211_ref_node(vap->iv_bss);
  
  	/* extract tstamp from last beacon and convert to TU */
  	nexttbtt = TSF_TO_TU(LE_READ_4(ni->ni_tstamp.data + 4),
 @@ -3415,6 +3416,7 @@ ath_beacon_config(struct ath_softc *sc, 
  			ath_beacon_start_adhoc(sc, vap);
  	}
  	sc->sc_syncbeacon = 0;
 +	ieee80211_free_node(ni);
  #undef FUDGE
  #undef TSF_TO_TU
  }
 @@ -3853,6 +3855,7 @@ ath_recv_mgmt(struct ieee80211_node *ni,
  	switch (subtype) {
  	case IEEE80211_FC0_SUBTYPE_BEACON:
  		/* update rssi statistics for use by the hal */
 +		/* XXX unlocked check against vap->iv_bss? */
  		ATH_RSSI_LPF(sc->sc_halstats.ns_avgbrssi, rssi);
  		if (sc->sc_syncbeacon &&
  		    ni == vap->iv_bss && vap->iv_state == IEEE80211_S_RUN) {
 @@ -5721,7 +5724,7 @@ ath_newstate(struct ieee80211vap *vap, e
  		taskqueue_unblock(sc->sc_tq);
  	}
  
 -	ni = vap->iv_bss;
 +	ni = ieee80211_ref_node(vap->iv_bss);
  	rfilt = ath_calcrxfilter(sc);
  	stamode = (vap->iv_opmode == IEEE80211_M_STA ||
  		   vap->iv_opmode == IEEE80211_M_AHDEMO ||
 @@ -5752,7 +5755,8 @@ ath_newstate(struct ieee80211vap *vap, e
  
  	if (nstate == IEEE80211_S_RUN) {
  		/* NB: collect bss node again, it may have changed */
 -		ni = vap->iv_bss;
 +		ieee80211_free_node(ni);
 +		ni = ieee80211_ref_node(vap->iv_bss);
  
  		DPRINTF(sc, ATH_DEBUG_STATE,
  		    "%s(RUN): iv_flags 0x%08x bintvl %d bssid %s "
 @@ -5875,6 +5879,7 @@ ath_newstate(struct ieee80211vap *vap, e
  #endif
  	}
  bad:
 +	ieee80211_free_node(ni);
  	return error;
  }
  
 @@ -5893,6 +5898,7 @@ ath_setup_stationkey(struct ieee80211_no
  	struct ath_softc *sc = vap->iv_ic->ic_ifp->if_softc;
  	ieee80211_keyix keyix, rxkeyix;
  
 +	/* XXX should take a locked ref to vap->iv_bss */
  	if (!ath_key_alloc(vap, &ni->ni_ucastkey, &keyix, &rxkeyix)) {
  		/*
  		 * Key cache is full; we'll fall back to doing
 @@ -6448,6 +6454,7 @@ ath_tdma_config(struct ath_softc *sc, st
  			return;
  		}
  	}
 +	/* XXX should take a locked ref to iv_bss */
  	tp = vap->iv_bss->ni_txparms;
  	/*
  	 * Calculate the guard time for each slot.  This is the
 @@ -6697,6 +6704,7 @@ ath_tdma_beacon_send(struct ath_softc *s
  		 * Record local TSF for our last send for use
  		 * in arbitrating slot collisions.
  		 */
 +		/* XXX should take a locked ref to iv_bss */
  		vap->iv_bss->ni_tstamp.tsf = ath_hal_gettsf64(ah);
  	}
  }
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
