From nobody@FreeBSD.org  Fri Feb 17 02:50:11 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63E4A106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Feb 2012 02:50:11 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 51C9D8FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Feb 2012 02:50:11 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1H2oB6v022659
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Feb 2012 02:50:11 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q1H2oB7u022658;
	Fri, 17 Feb 2012 02:50:11 GMT
	(envelope-from nobody)
Message-Id: <201202170250.q1H2oB7u022658@red.freebsd.org>
Date: Fri, 17 Feb 2012 02:50:11 GMT
From: Adrian Chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ath] "ath_rx_tasklet: sc_inreset_cnt > 0; skipping" messages
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         165220
>Category:       kern
>Synopsis:       [ath] "ath_rx_tasklet: sc_inreset_cnt > 0; skipping" messages
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-wireless
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Feb 17 03:00:21 UTC 2012
>Closed-Date:    
>Last-Modified:  Sat Feb 25 19:20:13 UTC 2012
>Originator:     Adrian Chadd
>Release:        9.0-RELEASE i386, with -HEAD net80211/ath
>Organization:
>Environment:
>Description:
There are far too many of the following messages showing up:


[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_start: sc_inreset_cnt > 0; bailing
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping

. What's happening is that a bunch of TX or RX completion is occuring at the same time as a channel change/reset. The previous code just had these run concurrently, causing all kinds of random and weird behaviour.

I introduced some debugging in FreeBSD-HEAD that tries to (a) stop these and (b) log when they occur, so I could actually try finding/figuring out what's going on.

This is one of these cases. :-)

>How-To-Repeat:
Just run ath :-)
>Fix:
The problem is that although interrupts are disabled in reset, the ath taskqueue may currently be running and this means existing TX completion and RX will occur.

I think the fix is to do this:

* grab the PCU lock
* disable interrupts
* grab the reset counter
* stop the ath taskqueue and wait for pending tasks to complete

. but is this still racy? A patch I've done to do the above in ath_reset() and ath_chan_change() reduces the instances of this quite heavily but it still occasionally turns up.

>Release-Note:
>Audit-Trail:

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/165220: commit references a PR
Date: Fri, 17 Feb 2012 03:23:16 +0000 (UTC)

 Author: adrian
 Date: Fri Feb 17 03:23:01 2012
 New Revision: 231854
 URL: http://svn.freebsd.org/changeset/base/231854
 
 Log:
   Begin breaking out the txrx stop code into a locked and unlocked variant.
   
   PR:	kern/165220
 
 Modified:
   head/sys/dev/ath/if_ath.c
 
 Modified: head/sys/dev/ath/if_ath.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath.c	Fri Feb 17 03:01:29 2012	(r231853)
 +++ head/sys/dev/ath/if_ath.c	Fri Feb 17 03:23:01 2012	(r231854)
 @@ -1903,15 +1903,16 @@ ath_stop_locked(struct ifnet *ifp)
  
  #define	MAX_TXRX_ITERATIONS	1000
  static void
 -ath_txrx_stop(struct ath_softc *sc)
 +ath_txrx_stop_locked(struct ath_softc *sc)
  {
  	int i = MAX_TXRX_ITERATIONS;
  
  	ATH_UNLOCK_ASSERT(sc);
 +	ATH_PCU_LOCK_ASSERT(sc);
 +
  	/* Stop any new TX/RX from occuring */
  	taskqueue_block(sc->sc_tq);
  
 -	ATH_PCU_LOCK(sc);
  	/*
  	 * Sleep until all the pending operations have completed.
  	 *
 @@ -1925,7 +1926,6 @@ ath_txrx_stop(struct ath_softc *sc)
  		msleep(sc, &sc->sc_pcu_mtx, 0, "ath_txrx_stop", 1);
  		i--;
  	}
 -	ATH_PCU_UNLOCK(sc);
  
  	if (i <= 0)
  		device_printf(sc->sc_dev,
 @@ -1935,6 +1935,17 @@ ath_txrx_stop(struct ath_softc *sc)
  #undef	MAX_TXRX_ITERATIONS
  
  static void
 +ath_txrx_stop(struct ath_softc *sc)
 +{
 +	ATH_UNLOCK_ASSERT(sc);
 +	ATH_PCU_UNLOCK_ASSERT(sc);
 +
 +	ATH_PCU_LOCK(sc);
 +	ath_txrx_stop_locked(sc);
 +	ATH_PCU_UNLOCK(sc);
 +}
 +
 +static void
  ath_txrx_start(struct ath_softc *sc)
  {
  
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/165220: commit references a PR
Date: Fri, 17 Feb 2012 03:46:48 +0000 (UTC)

 Author: adrian
 Date: Fri Feb 17 03:46:38 2012
 New Revision: 231857
 URL: http://svn.freebsd.org/changeset/base/231857
 
 Log:
   Enforce some consistent ordering and handling of interrupt disable/enable
   with RX/TX halting.
   
   * Always disable/enable interrupts during a channel change, just to simply
     things.
   
   * Ensure that the ath taskqueue has completed and is paused before
     continuing.
   
   This dramatically reduces the instances of overlapping RX and reset
   conditions.
   
   PR:	kern/165220
 
 Modified:
   head/sys/dev/ath/if_ath.c
 
 Modified: head/sys/dev/ath/if_ath.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath.c	Fri Feb 17 03:39:06 2012	(r231856)
 +++ head/sys/dev/ath/if_ath.c	Fri Feb 17 03:46:38 2012	(r231857)
 @@ -1934,6 +1934,7 @@ ath_txrx_stop_locked(struct ath_softc *s
  }
  #undef	MAX_TXRX_ITERATIONS
  
 +#if 0
  static void
  ath_txrx_stop(struct ath_softc *sc)
  {
 @@ -1944,6 +1945,7 @@ ath_txrx_stop(struct ath_softc *sc)
  	ath_txrx_stop_locked(sc);
  	ATH_PCU_UNLOCK(sc);
  }
 +#endif
  
  static void
  ath_txrx_start(struct ath_softc *sc)
 @@ -2049,11 +2051,12 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
  	ATH_UNLOCK_ASSERT(sc);
  
  	ATH_PCU_LOCK(sc);
 +	ath_hal_intrset(ah, 0);		/* disable interrupts */
 +	ath_txrx_stop_locked(sc);	/* Ensure TX/RX is stopped */
  	if (ath_reset_grablock(sc, 1) == 0) {
  		device_printf(sc->sc_dev, "%s: concurrent reset! Danger!\n",
  		    __func__);
  	}
 -	ath_hal_intrset(ah, 0);		/* disable interrupts */
  	ATH_PCU_UNLOCK(sc);
  
  	/*
 @@ -2061,7 +2064,6 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
  	 * and block future ones from occuring. This needs to be
  	 * done before the TX queue is drained.
  	 */
 -	ath_txrx_stop(sc);
  	ath_draintxq(sc, reset_type);	/* stop xmit side */
  
  	/*
 @@ -5383,21 +5385,16 @@ ath_chan_set(struct ath_softc *sc, struc
  	struct ieee80211com *ic = ifp->if_l2com;
  	struct ath_hal *ah = sc->sc_ah;
  	int ret = 0;
 -	int dointr = 0;
  
  	/* Treat this as an interface reset */
  	ATH_PCU_LOCK(sc);
 +	ath_hal_intrset(ah, 0);		/* Stop new RX/TX completion */
 +	ath_txrx_stop_locked(sc);	/* Stop pending RX/TX completion */
  	if (ath_reset_grablock(sc, 1) == 0) {
  		device_printf(sc->sc_dev, "%s: concurrent reset! Danger!\n",
  		    __func__);
  	}
 -	if (chan != sc->sc_curchan) {
 -		dointr = 1;
 -		/* XXX only do this if inreset_cnt is 1? */
 -		ath_hal_intrset(ah, 0);
 -	}
  	ATH_PCU_UNLOCK(sc);
 -	ath_txrx_stop(sc);
  
  	DPRINTF(sc, ATH_DEBUG_RESET, "%s: %u (%u MHz, flags 0x%x)\n",
  	    __func__, ieee80211_chan2ieee(ic, chan),
 @@ -5466,10 +5463,10 @@ ath_chan_set(struct ath_softc *sc, struc
  			ath_beacon_config(sc, NULL);
  		}
  
 -#if 0
  		/*
  		 * Re-enable interrupts.
  		 */
 +#if 0
  		ath_hal_intrset(ah, sc->sc_imask);
  #endif
  	}
 @@ -5478,8 +5475,7 @@ finish:
  	ATH_PCU_LOCK(sc);
  	sc->sc_inreset_cnt--;
  	/* XXX only do this if sc_inreset_cnt == 0? */
 -	if (dointr)
 -		ath_hal_intrset(ah, sc->sc_imask);
 +	ath_hal_intrset(ah, sc->sc_imask);
  	ATH_PCU_UNLOCK(sc);
  
  	/* XXX do this inside of IF_LOCK? */
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
Responsible-Changed-From-To: freebsd-bugs->freebsd-wireless 
Responsible-Changed-By: adrian 
Responsible-Changed-When: Fri Feb 17 07:23:32 UTC 2012 
Responsible-Changed-Why:  
Change 


http://www.freebsd.org/cgi/query-pr.cgi?pr=165220 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/165220: commit references a PR
Date: Sat, 25 Feb 2012 19:13:10 +0000 (UTC)

 Author: adrian
 Date: Sat Feb 25 19:12:54 2012
 New Revision: 232163
 URL: http://svn.freebsd.org/changeset/base/232163
 
 Log:
   Attempt to further fix some of the concurrency/reset issues that occur.
   
   * ath_reset() is being called in softclock context, which may have the
     thing sleep on a lock.  To avoid this, since we really _shouldn't_
     be sleeping on any locks, break out the no-loss reset path into a tasklet
     and call that from:
   
     + ath_calibrate()
     + ath_watchdog()
   
     This has the added advantage that it'll end up also doing the frame
     RX cleanup from within the taskqueue context, rather than the softclock
     context.
   
   * Shuffle around the taskqueue_block() call to be before we grab the lock
     and disable interrupts.
   
     The trouble here is that taskqueue_block() doesn't block currently
     queued (but not yet running) tasks so calling it doesn't guarantee
     no further tasks (that weren't running on _A_ CPU at the time of this
     call) will complete.  Calling taskqueue_drain() on these tasks won't
     work because if any _other_ thread calls taskqueue_enqueue() for whatever
     reason, everything gets very angry and stops working.
   
     This slightly changes the race condition enough to let ath_rx_tasklet()
     run before we try disabling it, and thus quietens the warnings a bit.
   
     The (more) true solution will be doing something like the following:
   
     * having a taskqueue_blocked mask in ath_softc;
     * having an interrupt_blocked mask in ath_softc;
     * only calling taskqueue_drain() on each individual task _after_ the
       lock has been acquired - that way no further tasklet scheduling
       is going to occur.
     * Then once the tasks have been blocked _and_ the interrupt has been
       disabled, call taskqueue_drain() on each, ensuring that anything
       that _was_ scheduled or running is removed.
   
     The trouble is if something calls taskqueue_enqueue() on a task
     after taskqueue_blocked() has been called but BEFORE taskqueue_drain()
     has been called, ta_pending will be set to 1 and taskqueue_drain()
     will sit there stuck in msleep() until you hard-kill the machine.
   
   PR: kern/165382
   PR: kern/165220
 
 Modified:
   head/sys/dev/ath/if_ath.c
   head/sys/dev/ath/if_athvar.h
 
 Modified: head/sys/dev/ath/if_ath.c
 ==============================================================================
 --- head/sys/dev/ath/if_ath.c	Sat Feb 25 18:48:06 2012	(r232162)
 +++ head/sys/dev/ath/if_ath.c	Sat Feb 25 19:12:54 2012	(r232163)
 @@ -159,6 +159,7 @@ static void	ath_beacon_proc(void *, int)
  static struct ath_buf *ath_beacon_generate(struct ath_softc *,
  			struct ieee80211vap *);
  static void	ath_bstuck_proc(void *, int);
 +static void	ath_reset_proc(void *, int);
  static void	ath_beacon_return(struct ath_softc *, struct ath_buf *);
  static void	ath_beacon_free(struct ath_softc *);
  static void	ath_beacon_config(struct ath_softc *, struct ieee80211vap *);
 @@ -392,6 +393,7 @@ ath_attach(u_int16_t devid, struct ath_s
  	TASK_INIT(&sc->sc_rxtask, 0, ath_rx_tasklet, sc);
  	TASK_INIT(&sc->sc_bmisstask, 0, ath_bmiss_proc, sc);
  	TASK_INIT(&sc->sc_bstucktask,0, ath_bstuck_proc, sc);
 +	TASK_INIT(&sc->sc_resettask,0, ath_reset_proc, sc);
  
  	/*
  	 * Allocate hardware transmit queues: one queue for
 @@ -1910,9 +1912,6 @@ ath_txrx_stop_locked(struct ath_softc *s
  	ATH_UNLOCK_ASSERT(sc);
  	ATH_PCU_LOCK_ASSERT(sc);
  
 -	/* Stop any new TX/RX from occuring */
 -	taskqueue_block(sc->sc_tq);
 -
  	/*
  	 * Sleep until all the pending operations have completed.
  	 *
 @@ -2050,6 +2049,9 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
  	ATH_PCU_UNLOCK_ASSERT(sc);
  	ATH_UNLOCK_ASSERT(sc);
  
 +	/* Try to (stop any further TX/RX from occuring */
 +	taskqueue_block(sc->sc_tq);
 +
  	ATH_PCU_LOCK(sc);
  	ath_hal_intrset(ah, 0);		/* disable interrupts */
  	ath_txrx_stop_locked(sc);	/* Ensure TX/RX is stopped */
 @@ -3163,6 +3165,23 @@ ath_beacon_start_adhoc(struct ath_softc 
  }
  
  /*
 + * Reset the hardware, with no loss.
 + *
 + * This can't be used for a general case reset.
 + */
 +static void
 +ath_reset_proc(void *arg, int pending)
 +{
 +	struct ath_softc *sc = arg;
 +	struct ifnet *ifp = sc->sc_ifp;
 +
 +#if 0
 +	if_printf(ifp, "%s: resetting\n", __func__);
 +#endif
 +	ath_reset(ifp, ATH_RESET_NOLOSS);
 +}
 +
 +/*
   * Reset the hardware after detecting beacons have stopped.
   */
  static void
 @@ -5387,6 +5406,12 @@ ath_chan_set(struct ath_softc *sc, struc
  	int ret = 0;
  
  	/* Treat this as an interface reset */
 +	ATH_PCU_UNLOCK_ASSERT(sc);
 +	ATH_UNLOCK_ASSERT(sc);
 +
 +	/* (Try to) stop TX/RX from occuring */
 +	taskqueue_block(sc->sc_tq);
 +
  	ATH_PCU_LOCK(sc);
  	ath_hal_intrset(ah, 0);		/* Stop new RX/TX completion */
  	ath_txrx_stop_locked(sc);	/* Stop pending RX/TX completion */
 @@ -5526,18 +5551,10 @@ ath_calibrate(void *arg)
  			DPRINTF(sc, ATH_DEBUG_CALIBRATE,
  				"%s: rfgain change\n", __func__);
  			sc->sc_stats.ast_per_rfgain++;
 -			/*
 -			 * Drop lock - we can't hold it across the
 -			 * ath_reset() call. Instead, we'll drop
 -			 * out here, do a reset, then reschedule
 -			 * the callout.
 -			 */
 -			callout_reset(&sc->sc_cal_ch, 1, ath_calibrate, sc);
  			sc->sc_resetcal = 0;
  			sc->sc_doresetcal = AH_TRUE;
 -			ATH_UNLOCK(sc);
 -			ath_reset(ifp, ATH_RESET_NOLOSS);
 -			ATH_LOCK(sc);
 +			taskqueue_enqueue(sc->sc_tq, &sc->sc_resettask);
 +			callout_reset(&sc->sc_cal_ch, 1, ath_calibrate, sc);
  			return;
  		}
  		/*
 @@ -6187,11 +6204,12 @@ ath_watchdog(void *arg)
  
  	/*
  	 * We can't hold the lock across the ath_reset() call.
 +	 *
 +	 * And since this routine can't hold a lock and sleep,
 +	 * do the reset deferred.
  	 */
  	if (do_reset) {
 -		ATH_UNLOCK(sc);
 -		ath_reset(sc->sc_ifp, ATH_RESET_NOLOSS);
 -		ATH_LOCK(sc);
 +		taskqueue_enqueue(sc->sc_tq, &sc->sc_resettask);
  	}
  
  	callout_schedule(&sc->sc_wd_ch, hz);
 
 Modified: head/sys/dev/ath/if_athvar.h
 ==============================================================================
 --- head/sys/dev/ath/if_athvar.h	Sat Feb 25 18:48:06 2012	(r232162)
 +++ head/sys/dev/ath/if_athvar.h	Sat Feb 25 19:12:54 2012	(r232163)
 @@ -501,6 +501,7 @@ struct ath_softc {
  	struct ath_txq		*sc_cabq;	/* tx q for cab frames */
  	struct task		sc_bmisstask;	/* bmiss int processing */
  	struct task		sc_bstucktask;	/* stuck beacon processing */
 +	struct task		sc_resettask;	/* interface reset task */
  	enum {
  		OK,				/* no change needed */
  		UPDATE,				/* update pending */
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
