From nobody@FreeBSD.org  Sun Dec  9 00:48:54 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id AE5657C0
	for <freebsd-gnats-submit@FreeBSD.org>; Sun,  9 Dec 2012 00:48:54 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 94B1C8FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Sun,  9 Dec 2012 00:48:54 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id qB90msVo056783
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 9 Dec 2012 00:48:54 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id qB90ms79056782;
	Sun, 9 Dec 2012 00:48:54 GMT
	(envelope-from nobody)
Message-Id: <201212090048.qB90ms79056782@red.freebsd.org>
Date: Sun, 9 Dec 2012 00:48:54 GMT
From: Adrian Chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         174283
>Category:       kern
>Synopsis:       [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Dec 09 00:50:00 UTC 2012
>Closed-Date:    
>Last-Modified:  Mon Dec 10 00:26:43 UTC 2012
>Originator:     Adrian Chadd
>Release:        -HEAD
>Organization:
>Environment:
>Description:
There are panics in the net80211 fast-frame queue ageing and flushing code.

It looks like the staging queue ends up being empty and the net80211 FF routines have KASSERT()s to make sure the queue isn't empty.  I'm guessing its a sanity check - it shouldn't be called when the queues are empty.

However, the check is done without the comlock being held, so it's entirely plausible that there'll be a race or preemption between the check and actually checking/emptying the queue; where another thread (CPU or preempted thread) will empty the FF AC queue for us; once this returns it panics.

kgdb analysis of a crashdump shows:

* ath_tx_processq()
* ieee80211_ff_flush()
* ieee80211_ff_age()

ieee80211_ff_flush() checks if the queue is empty and if not, it calls ieee80211_ff_flush().

There's a bunch of places the FF routines are called from and these can and do overlap.


>How-To-Repeat:
* run 9-stable or -head with assert/witness enabled;
* iperf TCP between FF capable stations - just wait a while, it'll eventually trigger!
>Fix:
The solutions?

* stick the ieee80211_ff_*() calls in a specific taskqueue and call them from there, rather than from both the TX, RX and TX completion context;
* grab the comlock before checking, and make sure the function expects the comlock to be held and frees the comlock after;
* Just accept (and document) the check is racy/opportunistic; and remove the "is the queue empty?" KASSERT()s in the FF code.


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: adrian 
Responsible-Changed-When: Sun Dec 9 00:50:24 UTC 2012 
Responsible-Changed-Why:  
punt to maintainer list 


http://www.freebsd.org/cgi/query-pr.cgi?pr=174283 
>Unformatted:
