From nobody@FreeBSD.org  Sun Jan 22 19:45:30 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58A6C1065676
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 Jan 2012 19:45:30 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 32BEF8FC23
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 Jan 2012 19:45:30 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q0MJjUZE046359
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 Jan 2012 19:45:30 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q0MJjUjI046358;
	Sun, 22 Jan 2012 19:45:30 GMT
	(envelope-from nobody)
Message-Id: <201201221945.q0MJjUjI046358@red.freebsd.org>
Date: Sun, 22 Jan 2012 19:45:30 GMT
From: Adrian Chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ath] crash when down/deleting a vap - inside ieee80211_input_mimo_all()
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         164382
>Category:       kern
>Synopsis:       [ath] crash when down/deleting a vap - inside ieee80211_input_mimo_all()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jan 22 19:50:10 UTC 2012
>Closed-Date:    
>Last-Modified:  Sun Jan 22 22:26:16 UTC 2012
>Originator:     Adrian Chadd
>Release:        9.0-RC2, with -HEAD ath/net80211
>Organization:
FreeBSD
>Environment:
FreeBSD marilyn 9.0-RC3-p1 FreeBSD 9.0-RC3-p1 #4: Sat Jan 21 20:56:40 PST 2012     root@marilyn:/usr/src/sys/i386/compile/MARILYN  i386

>Description:
I saw a crash inside the net80211 stack when either deleting or down'ing a vap.


Unread portion of the kernel message buffer:
KDB: stack backtrace:
#0 0xc0727697 at kdb_backtrace+0x47
#1 0xc073b675 at _witness_debugger+0x25
#2 0xc073cb8e at witness_warn+0x1fe
#3 0xc095e465 at trap+0x195
#4 0xc09478ac at calltrap+0x6
#5 0xc77e2bf1 at ieee80211_free_node_debug+0xb1
#6 0xc77ce017 at ieee80211_input_mimo_all+0xe7
#7 0xc77cdf22 at ieee80211_input_all+0x32
#8 0xc784dcc5 at ath_rx_proc+0xc45
#9 0xc784d071 at ath_rx_tasklet+0x101
#10 0xc073446b at taskqueue_run_locked+0xeb
#11 0xc0734ec7 at taskqueue_thread_loop+0x67
#12 0xc06c76b8 at fork_exit+0xb8
#13 0xc0947924 at fork_trampoline+0x8

The debugging indicated something rather amusing at this point.


ath0: ath_node_alloc: an 0xc7adf000
ieee80211_ref_node: 0xc7adf000: ieee80211_reset_bss /usr/home/adrian/work/freebsd/ath/head/src/sys/modules/wl
an/../../net80211/ieee80211_node.c:434
wlan0: Ethernet address: 00:03:7f:11:a3:f3
ath0: ath_init: if_flags 0x8803
ath0: ath_stop_locked: invalid 0 if_flags 0x8803
ath0: ath_newstate: INIT -> INIT
ath0: ath_newstate: RX filter 0x6497 bssid 00:00:00:00:00:00 aid 0x0
ath0: ath_newstate: INIT -> SCAN
ath0: ath_newstate: RX filter 0x6497 bssid 00:00:00:00:00:00 aid 0x0
ath0: ath_node_alloc: an 0xc7aea000

. now at this point, there are two sets of messages which overlap, indicating that they ran concurrently:

ieee80211_ref_node: 0xc7aea000: ieee80211_create_ibss /usr/home/adrian/work/freebsd/ath/head/src/sys/modules/
wlan/../../net802

ieee80211_ref_node: 0xc7adf000: ieee80211_input_mimo_all /usr/home/adrian/work/freebsd/ath/head/src/sys/modul
es/wlan/../../net

11/ieee80211_node.c:412

80211/ieee80211_input.c:143
ath0: ath_node_free: ni 0xc7adf000

. and bang:

Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex ath0_node_lock (ath0_node_lock) r = 0 (0xc79316c0) locked @ /usr/home/adrian/work/freeb
sd/ath/head/src/sys/modules/wlan/../../net80211/ieee80211_node.c:1702

I'm gathering here that the delete was ongoing whilst traffic was being processed via ath_rx_tasklet() and the underlying vap was either deleted or the vap->iv_bss node was changed.

There seems to be a larger class of bugs where the vap->iv_bss node is changed in parallel with some other process (eg beacon free/alloc) without suitable locking.

>How-To-Repeat:
It's difficult to reproduce. I reproduced it in a lab environment with lots of busy air. I guess anything that triggers constant incoming traffic and keeps the RX queue deep is going to make triggering this bug.

What needs to happen:

* ath_rx_tasklet() needs to take a while to run;
* the ifconfig process (and net80211 taskqueue) needs to be scheduled on another CPU, so it can run _in parallel_ with the ath taskqueue (which ath_rx_tasklet() runs in)
* somehow you have to get a vap down/delete in during this RX.

>Fix:

I think the RX path should be properly aborted during a a vap down/delete. This doesn't just mean stopping the hardware (which is what ath_stop_locked() currently does) but also waiting for the ath_rx_tasklet() and the TX completion tasklet to complete.


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Jan 22 22:26:05 UTC 2012 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=164382 
>Unformatted:
