From nobody@FreeBSD.org  Mon Jan  2 00:50:09 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0C8E1065672
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  2 Jan 2012 00:50:09 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 7ABC68FC16
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  2 Jan 2012 00:50:09 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q020o9YZ047359
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 2 Jan 2012 00:50:09 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q020o9o1047358;
	Mon, 2 Jan 2012 00:50:09 GMT
	(envelope-from nobody)
Message-Id: <201201020050.q020o9o1047358@red.freebsd.org>
Date: Mon, 2 Jan 2012 00:50:09 GMT
From: Nathan Lay <nsl03@my.fsu.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ath(4) "stops working" in hostap mode
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         163759
>Category:       kern
>Synopsis:       [ath] ath(4) "stops working" in hostap mode
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jan 02 01:00:23 UTC 2012
>Closed-Date:    
>Last-Modified:  Mon Jan  2 06:20:11 UTC 2012
>Originator:     Nathan Lay
>Release:        9-STABLE
>Organization:
>Environment:
FreeBSD RADIO.LOCAL 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3: Sat Dec 31 20:52:54 EST 2011     nslay@RADIO.LOCAL:/usr/obj/usr/src/sys/RADIO  amd64
>Description:
At an arbitrary time, ath "stops working" while in hostap mode. It vanishes with respect to other wireless clients and it cannot be fixed by bringing the interface up/down or destroying and recreating the interface. Tcpdump confirms that the access point really is no longer visible. Reloading the driver, however, can remedy the problem. The problem device is given below:

ath0: <Atheros 5416> mem 0xfe9f0000-0xfe9fffff irq 16 at device 0.0 on pci1
ath0: AR5418 mac 12.10 RF2133 phy 8.1

Here is how it is configured and used:
create_args_wlan0="wlanmode hostap -bgscan"
ifconfig_wlan0="channel 5:ht/40 ssid Lamp up"
autobridge_bridge0="wlan0 lan0"

It also sits behind pf.

The kernel is not compiled with ATH_ENABLE_11N.

It is also worth mentioning that the aforementioned configuration worked without problems in 8.x.

Other suspicious behavior:
athstats before the problem:
 bexmit bmiss
   4410     0
     10     0
     10     0
      9     0

athstats after the problem:
 bexmit bmiss
  52014     0
      5     0
      4     0
      5     0

dmesg frequently reports beacon misses before and after the problem:
ath0: stuck beacon; resetting (bmiss count 4)
 
Here is the output of athstats after the problem:
352222   data frames received
317354   data frames transmit
113      tx frames with an alternate rate
11861    long on-chip tx retries
755      tx failed 'cuz too many retries
691      stuck beacon conditions
1M       current transmit rate
5537     tx frames with no ack marked
309725   tx frames with short preamble
11773    rx failed 'cuz of bad CRC
3053     rx failed 'cuz of PHY err
    3053     CCK restart
52218    beacons transmitted
181      periodic calibrations
-0/+0    TDMA slot adjust (usecs, smoothed)
58       rssi of last ack
31       avg recv rssi
-96      rx noise floor
2092     tx frames through raw api
241      cabq frames transmitted
97       cabq xmit overflowed beacon interval
1        spur immunity level
54       ANI increased spur immunity
53       ANI decrease spur immunity
693      ANI enabled OFDM weak signal detect
693      ANI disabled CCK weak signal threshold
13947724 cumulative OFDM phy error count
13047675 cumulative CCK phy error count
902      ANI forced listen time to zero
11860    missing ACK's
21408    bad FCS
24       average rssi (beacons only)
Antenna profile:
[0] tx   316491 rx    23814
[1] tx        0 rx   328408

>How-To-Repeat:
No known way to repeat the problem. However, the following:
options ATH_DEBUG
options AH_DEBUG
options ATH_DIAGAPI

seem to make the problem happen more frequently.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Jan 2 05:18:43 UTC 2012 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=163759 

From: Adrian Chadd <adrian@freebsd.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/163759: [at] ath(4) "stops working" in hostap mode
Date: Sun, 1 Jan 2012 22:16:21 -0800

 A little more digging has shown at least one source of these: software
 retries are sneaking onto the list.
 
 Ie:
 
 * force 11n aggregation up - do a whole bunch of traffic;
 * enabled debugging - sysctl dev.ath.1.debug=0x7c002000 - that's the
 SW TX handling bits and the TX_PROC debugging;
 * ping -i 0.3 <ip> in one screen
 * scan in the other (ifconfig wlan1 scan)
 * notice the tid_drain things being logged.
 
 What I've seen:
 
 * frame is queued via ath_start() or ath_raw_xmit()
 * .. it makes it out to the hardware
 * ath_tx_processq() is called in the flush routine, with dosched=0
 * .. and it requires a retry, for reasons I haven't yet figured out.
 Since aggregation is up, the frame is retried in software.
 * .. so the frame is replaced on the software queue, but sched isn't
 called for it, so it sits on the software queue.
 * .. then drain is called, with a software-queued frame in the queue.
 
 So now, what should I do here? Hum.
 
 
 adrian
>Unformatted:
