From nobody@FreeBSD.org  Fri May 10 09:50:20 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	by hub.freebsd.org (Postfix) with ESMTP id D73F484C
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 May 2013 09:50:20 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from oldred.FreeBSD.org (oldred.freebsd.org [8.8.178.121])
	by mx1.freebsd.org (Postfix) with ESMTP id AFE7CDF0
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 May 2013 09:50:20 +0000 (UTC)
Received: from oldred.FreeBSD.org ([127.0.1.6])
	by oldred.FreeBSD.org (8.14.5/8.14.5) with ESMTP id r4A9oKuq091257
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 May 2013 09:50:20 GMT
	(envelope-from nobody@oldred.FreeBSD.org)
Received: (from nobody@localhost)
	by oldred.FreeBSD.org (8.14.5/8.14.5/Submit) id r4A9oKXr091256;
	Fri, 10 May 2013 09:50:20 GMT
	(envelope-from nobody)
Message-Id: <201305100950.r4A9oKXr091256@oldred.FreeBSD.org>
Date: Fri, 10 May 2013 09:50:20 GMT
From: adrian chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ath] missed beacon / soft reset in STA mode results in hardware error and DMA engine lockup
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         178477
>Category:       kern
>Synopsis:       [ath] missed beacon / soft reset in STA mode results in hardware error and DMA engine lockup
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 10 10:00:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Fri May 10 18:46:48 UTC 2013
>Originator:     adrian chadd
>Release:        -HEAD
>Organization:
>Environment:
>Description:
With my most recent changes in ath(4) to the TX DMA list (ie, only writing new TxDP entries for a queue for the first frame being sent after reset; then always using the holding descriptor and link pointer for subsequent frames) I've uncovered a rather annoying bug.

If a no-loss reset is done (ie, no packets are lost) the hardware will end up locking up.

This is triggerable in STA mode. AP mode doesn't (for now) seem to be a problem.

What's seen:

ath0: hardware error; resetting
ath0: 0x00000000 0x00000020 0x00000000, 0x00000000 0x00000000 0x00000000
ar5416StopDmaReceive: dma failed to stop in 10ms
AR_CR=0x00000024
AR_DIAG_SW=0x42000020

after this point, no combination of soft or hard chip reset unlocks the DMA engine.

When reset debugging is enabled, the queue looks like this:

ath0: ath_tx_stopdma: tx queue [3] 0, active=1, hwpending=1, flags 0x00000000, link 0x<ptr>

As far as I'm aware, the TX queue TxDP should never be 0x0 if it's active.

Anyway. This is easy to reproduce.
>How-To-Repeat:
* Insert AR5416 card
* Create STA vap
* Associate to AP
* Force a 'stuck beacon' no-loss reset - sysctl dev.ath.X.forcebstuck=1
* .. the next transmission will cause a hardware error.

>Fix:
Not sure yet. There's not many things that can go wrong here:

* is there a frame on the TXQ that's actually already been freed?
* is the holding descriptor not being freed during a soft reset?
* .. and what about the link pointer? it should be set to NULL during reset, then the DMA restart routine should re-initialise the link pointer to the last descriptor in the last frame in the list. Or NULL, if the list is empty.

Actually, I just hacked on the DMA restart code to ensure that the link pointer is either initialised to the last descriptor in the list or NULL. That seems to have fixed it. So, the reset path isn't freeing the holding descriptor or NULL'ing the axq_link pointer.

Fix that!

>Release-Note:
>Audit-Trail:

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/178477: commit references a PR
Date: Fri, 10 May 2013 10:06:53 +0000 (UTC)

 Author: adrian
 Date: Fri May 10 10:06:45 2013
 New Revision: 250444
 URL: http://svnweb.freebsd.org/changeset/base/250444
 
 Log:
   Make sure the holding descriptor and link pointer are both freed during
   a non-loss reset.
   
   When the drain functions are called, the holding descriptor and link pointers
   are NULLed out.
   
   But when the processq function is called during a non-loss reset, this
   doesn't occur.  So the next time a DMA occurs, it's chained to a descriptor
   that no longer exists and the hardware gets angry.
   
   Tested:
   
   * AR5416, STA mode; use sysctl dev.ath.X.forcebstuck=1 to force a non-loss
     reset.
   
   TODO:
   
   * Further AR9380 testing just to check that the behaviour for the EDMA
     chips is sane.
   
   PR:		kern/178477
 
 Modified:
   head/sys/dev/ath/if_ath.c
   head/sys/dev/ath/if_ath_tx_edma.c
 
 Modified: head/sys/dev/ath/if_ath.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath.c	Fri May 10 09:58:32 2013	(r250443)
 +++ head/sys/dev/ath/if_ath.c	Fri May 10 10:06:45 2013	(r250444)
 @@ -4668,9 +4668,21 @@ ath_legacy_tx_drain(struct ath_softc *sc
  			if (sc->sc_debug & ATH_DEBUG_RESET)
  				ath_tx_dump(sc, &sc->sc_txq[i]);
  #endif	/* ATH_DEBUG */
 -			if (reset_type == ATH_RESET_NOLOSS)
 +			if (reset_type == ATH_RESET_NOLOSS) {
  				ath_tx_processq(sc, &sc->sc_txq[i], 0);
 -			else
 +				ATH_TXQ_LOCK(&sc->sc_txq[i]);
 +				/*
 +				 * Free the holding buffer; DMA is now
 +				 * stopped.
 +				 */
 +				ath_txq_freeholdingbuf(sc, &sc->sc_txq[i]);
 +				/*
 +				 * Reset the link pointer to NULL; there's
 +				 * no frames to chain DMA to.
 +				 */
 +				sc->sc_txq[i].axq_link = NULL;
 +				ATH_TXQ_UNLOCK(&sc->sc_txq[i]);
 +			} else
  				ath_tx_draintxq(sc, &sc->sc_txq[i]);
  		}
  	}
 
 Modified: head/sys/dev/ath/if_ath_tx_edma.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath_tx_edma.c	Fri May 10 09:58:32 2013	(r250443)
 +++ head/sys/dev/ath/if_ath_tx_edma.c	Fri May 10 10:06:45 2013	(r250444)
 @@ -551,6 +551,22 @@ ath_edma_tx_drain(struct ath_softc *sc, 
  	 */
  	if (reset_type == ATH_RESET_NOLOSS) {
  		ath_edma_tx_processq(sc, 0);
 +		for (i = 0; i < HAL_NUM_TX_QUEUES; i++) {
 +			if (ATH_TXQ_SETUP(sc, i)) {
 +				ATH_TXQ_LOCK(&sc->sc_txq[i]);
 +				/*
 +				 * Free the holding buffer; DMA is now
 +				 * stopped.
 +				 */
 +				ath_txq_freeholdingbuf(sc, &sc->sc_txq[i]);
 +				/*
 +				 * Reset the link pointer to NULL; there's
 +				 * no frames to chain DMA to.
 +				 */
 +				sc->sc_txq[i].axq_link = NULL;
 +				ATH_TXQ_UNLOCK(&sc->sc_txq[i]);
 +			}
 +		}
  	} else {
  		for (i = 0; i < HAL_NUM_TX_QUEUES; i++) {
  			if (ATH_TXQ_SETUP(sc, i))
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Fri May 10 18:46:41 UTC 2013 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=178477 
>Unformatted:
