From nobody@FreeBSD.org  Sat Feb 25 19:49:51 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E2479106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2012 19:49:51 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id B220D8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2012 19:49:51 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1PJnpTE039878
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2012 19:49:51 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q1PJnp2p039877;
	Sat, 25 Feb 2012 19:49:51 GMT
	(envelope-from nobody)
Message-Id: <201202251949.q1PJnp2p039877@red.freebsd.org>
Date: Sat, 25 Feb 2012 19:49:51 GMT
From: Adrian Chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ath] operational mode change doesn't poke the underlying rate control module hard enough
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         165475
>Category:       kern
>Synopsis:       [ath] operational mode change doesn't poke the underlying rate control module hard enough
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Feb 25 19:50:11 UTC 2012
>Closed-Date:    
>Last-Modified:  Sun Feb 26 06:10:10 UTC 2012
>Originator:     Adrian Chadd
>Release:        9.0-RELEASE, w/ -HEAD net80211/ath
>Organization:
>Environment:
>Description:
This reared its ugly head when testing with an AR5211 (11b/11a, no 11g.)


Specifically:

* the operational mode change has occured (sc->sc_currates is pointing to the 11b table);
* ath_sample_node->ratemask is 0xff for some reason - likely indicating it was assembled from the 11a rate able (which in ath_hal/ar5211/ar5211_phy.c has 8 11a rates in it);
* so ath_rate_findrate() thinks best_rix is fine and the current rate table mapping is fine.

This is likely very similar to other issues with rate control in ath being slightly weird after an operational mode change, if the NIC hasn't transitioned back into the original operating mode. The rate control code isn't informed of this (it only gets told of association/reassociation, and ath_rate_sample is only updating the rate table on _new_ associations) so it doesn't realise it has to rethink its current rate table setup.
>How-To-Repeat:


Setup:

* net80211/ath and kernel built with full debugging, assert, witness, etc
* associated to an 11a AP (so it has the 11a OFDM table)
* running iperf
* the session hangs for some reason, I'm not quite sure yet
* .. then the bgscan code kicks in and starts scanning
* .. and for some reason, the NIC is in 11b mode now, and tries TX'ing
* But the "best rix" in ath_rate_findrate (in ath_rate_sample) is referencing an 11a rate, not an 11b rate - ie, rix > the current greatest rix in the config.
* .. so things panic.

>Fix:
I'm not yet sure.

Because of background scanning, it's entirely possible the NIC will spend a non-zero amount of time off channel, TX'ing things which SHOULD have fixed rates.

The ath_rate module code isn't currently informed about channel changes, as the channel change doesn't inform all associated nodes of this fact.

Any rate control lookups during off-channel times will cause things to be confused.

I should first check whether this crash occured with the NIC being in off-channel mode. If so, it shouldn't have tried TXing a data frame at this point. No, i just checked - ni->ni_vap->iv_flags is 0x430c4010 - and 0x80 is IEEE80211_F_SCAN; 0x100 is IEEE80211_F_ASCAN.

So first let's see if _why_ the NIC is in 11b mode can be made obvious. Then, once that's done, figure out why the transition didn't trigger a rate control update.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: adrian 
Responsible-Changed-When: Sat Feb 25 19:53:43 UTC 2012 
Responsible-Changed-Why:  
Reassign 


http://www.freebsd.org/cgi/query-pr.cgi?pr=165475 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/165475: commit references a PR
Date: Sun, 26 Feb 2012 06:05:00 +0000 (UTC)

 Author: adrian
 Date: Sun Feb 26 06:04:44 2012
 New Revision: 232170
 URL: http://svn.freebsd.org/changeset/base/232170
 
 Log:
   Add in some debugging code to check whether the current rate table has
   been bait-and-switched from the rate control code.
   
   This will avoid the panic that I saw and will avoid sending invalid rates
   (eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the
   rate table is not "big".
   
   It also will point out situations where this occurs for the 11n NICs
   which will have sufficiently large rate tables that "invalid rix" doesn't
   occur.
   
   I'll try to follow this up with a commit that adds a current operating mode
   check. The "rix" is only relevant to the current operating mode and rate
   table.
   
   PR:	kern/165475
 
 Modified:
   head/sys/dev/ath/ath_rate/sample/sample.c
   head/sys/dev/ath/ath_rate/sample/sample.h
 
 Modified: head/sys/dev/ath/ath_rate/sample/sample.c
 ==============================================================================
 --- head/sys/dev/ath/ath_rate/sample/sample.c	Sun Feb 26 02:24:40 2012	(r232169)
 +++ head/sys/dev/ath/ath_rate/sample/sample.c	Sun Feb 26 06:04:44 2012	(r232170)
 @@ -495,6 +495,14 @@ ath_rate_findrate(struct ath_softc *sc, 
  
  	ath_rate_update_static_rix(sc, &an->an_node);
  
 +	if (sn->currates != sc->sc_currates) {
 +		device_printf(sc->sc_dev, "%s: currates != sc_currates!\n",
 +		    __func__);
 +		rix = 0;
 +		*try0 = ATH_TXMAXTRY;
 +		goto done;
 +	}
 +
  	if (sn->static_rix != -1) {
  		rix = sn->static_rix;
  		*try0 = ATH_TXMAXTRY;
 @@ -621,6 +629,20 @@ ath_rate_findrate(struct ath_softc *sc, 
  	}
  	*try0 = mrr ? sn->sched[rix].t0 : ATH_TXMAXTRY;
  done:
 +
 +	/*
 +	 * This bug totally sucks and should be fixed.
 +	 *
 +	 * For now though, let's not panic, so we can start to figure
 +	 * out how to better reproduce it.
 +	 */
 +	if (rix < 0 || rix >= rt->rateCount) {
 +		printf("%s: ERROR: rix %d out of bounds (rateCount=%d)\n",
 +		    __func__,
 +		    rix,
 +		    rt->rateCount);
 +		    rix = 0;	/* XXX just default for now */
 +	}
  	KASSERT(rix >= 0 && rix < rt->rateCount, ("rix is %d", rix));
  
  	*rix0 = rix;
 @@ -1073,6 +1095,8 @@ ath_rate_ctl_reset(struct ath_softc *sc,
          sn->static_rix = -1;
  	ath_rate_update_static_rix(sc, ni);
  
 +	sn->currates = sc->sc_currates;
 +
  	/*
  	 * Construct a bitmask of usable rates.  This has all
  	 * negotiated rates minus those marked by the hal as
 
 Modified: head/sys/dev/ath/ath_rate/sample/sample.h
 ==============================================================================
 --- head/sys/dev/ath/ath_rate/sample/sample.h	Sun Feb 26 02:24:40 2012	(r232169)
 +++ head/sys/dev/ath/ath_rate/sample/sample.h	Sun Feb 26 06:04:44 2012	(r232170)
 @@ -86,6 +86,8 @@ struct sample_node {
  	uint32_t ratemask;		/* bit mask of valid rate indices */
  	const struct txschedule *sched;	/* tx schedule table */
  
 +	const HAL_RATE_TABLE *currates;
 +
  	struct rate_stats stats[NUM_PACKET_SIZE_BINS][SAMPLE_MAXRATES];
  	int last_sample_rix[NUM_PACKET_SIZE_BINS];
  
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
