From bill@twwells.com  Sat Feb 16 11:27:03 2002
Return-Path: <bill@twwells.com>
Received: from mail.junkproof.net (mail.junkproof.net [206.55.70.12])
	by hub.freebsd.org (Postfix) with ESMTP id 0D46E37B404
	for <freebsd-gnats-submit@freebsd.org>; Sat, 16 Feb 2002 11:27:03 -0800 (PST)
Received: from mail (helo=mail.junkproof.net)
	by mail.junkproof.net with local-bsmtp (Exim 3.32 #1)
	id 16cAW1-0009l3-00
	for freebsd-gnats-submit@freebsd.org; Sat, 16 Feb 2002 13:28:57 -0600
Received: from bill.twwells.com ( [68.44.48.161] )
	by mail.junkproof.net via tcp with submission
	id 3c6eb1cf-009224; Sat, 16 Feb 2002 13:23:59 -0600
Received: from bill by bill.twwells.com with local (Exim 3.34 #1)
	id 16cAPF-0002Qk-00
	for FreeBSD-gnats-submit@freebsd.org; Sat, 16 Feb 2002 14:21:57 -0500
Message-Id: <E16cAPF-0002Qk-00@bill.twwells.com>
Date: Sat, 16 Feb 2002 14:21:57 -0500
From: Bill Wells <bill@twwells.com>
Reply-To: Bill Wells <bill@twwells.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [PATCH] Fix for pcm driver lockups
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         35004
>Category:       kern
>Synopsis:       [PATCH] Fix for pcm driver lockups
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    sound
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Feb 16 11:30:01 PST 2002
>Closed-Date:    Fri Oct 11 06:34:35 PDT 2002
>Last-Modified:  Fri Oct 11 06:34:35 PDT 2002
>Originator:     Bill Wells
>Release:        FreeBSD 4.5-STABLE i386
>Organization:
>Environment:
System: FreeBSD bill.twwells.com 4.5-STABLE FreeBSD 4.5-STABLE #0: Sat Feb 16 04:01:38 EST 2002 toor@bill.twwells.com:/usr/obj/usr/src/sys/BILL i386


	
>Description:
	There is a race condition in pcm that will cause it to
	lock up in certain circumstances. When the driver is
	locked up, all further attempts at accessing it result in
	"Device busy". (NB: The -current fixes don't correct
	this.)

>How-To-Repeat:

	Run this:

	while :; do cp beep.au /dev/audio; done

	And while it is running, enter this:

	cp beep.au /dev/audio

	(beep.au can be anything, but the shorter it is the
	better.)

>Fix:
	The appended patches (against stable) fix this problem and
	a few others as well. Before the patches (use patch -l and
	you'll probably have to fix the spacing by hand), I'll
	explain what they're about.

	The first patch, against sound.c, simply adds some
	newlines to certain debugging messages. The second patch
	involves a number of changes to resolve the lockup problem
	and other problems.

	First, this bit of code in dsp_open (and its equivalent
	for wrch), sets up the channel. Its main problem is that
	the reference count is not updated for a newly opened
	channel but is instead updated if the channel had been
	previously opened. (E.g., an open to read will not
	increment the count but a following call to write will
	cause the *read* count to be incremented.) These reference
	count problems have the effect of preventing kldload from
	unloading the driver under certain circumstances.

	if (rdch) {
		if (flags & FREAD) {
			chn_reset(rdch, fmt);
			if (flags & O_NONBLOCK)
				rdch->flags |= CHN_F_NBIO;
		} else {
			CHN_LOCK(rdch);
			pcm_chnref(rdch, 1);
		}
		CHN_UNLOCK(rdch);
	}

	Here's the replacement code. This code is much simpler
	because it takes advantage of a couple of facts. First, if
	FREAD is set, rdch (i_dev->si_drv1) *must* have previously
	been null and *must* be non-null now. So, no need to test
	it. Also, the channel that was allocated is already
	locked, so no need to do that again.

	if (flags & FREAD) {
		chn_reset(rdch, fmt);
		if (flags & O_NONBLOCK)
			rdch->flags |= CHN_F_NBIO;
		pcm_chnref(rdch, 1);
		CHN_UNLOCK(rdch);
	}

	The remaining changes are in dsp_close. This bit of code
	(and the equivalent wrch code) leaves the channel locked
	if the reference count doesn't go to zero. That means that
	the channel remains locked when dsp_close exits; I don't
	think that's intended. The unlock goes afer the "if", not
	inside it.

	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
			CHN_UNLOCK(rdch);
			exit = 1;
		}
	}

	Note this comment that I've included in the patch. If the
	answer is "no", then all is good and the comment can go
	away. Otherwise, it's necessary to someday fix the problem
	and the comment should be retained in the code.

	/* XXX And what happens if one of the channels had 2 references and
	   the other has but one? The latter won't get reset. Can that
	   happen? */

	In the original code, these twe lines are executed if both
	of the reference counts go to zero but not if either is
	nonzero. If the reference counts are supposed to indicate
	the number of references from si_drv?'s, that's wrong;
	these need to be nulled out regardless of what the
	reference counts do.

	i_dev->si_drv1 = NULL;
	i_dev->si_drv2 = NULL;

	So, one fix involving these lines is to add them to the
	"if (exit)" code. This ensures that the reference counts
	correspond to the references.

	However, that's not the only problem with these two lines.
	In the original code, the fields are cleared before the
	abort and flush. In my patch, they are cleared after it.
	This misplacement is the actual cause of the lockups.

*** sound.c.orig        Fri Feb 15 16:44:01 2002
--- sound.c     Sat Feb 16 03:28:30 2002
***************
*** 416,434 ****

	snd_mtxlock(d->lock);
	if (d->inprog) {
!               device_printf(dev, "unregister: operation in progress");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
	SLIST_FOREACH(sce, &d->channels, link) {
		if (sce->channel->refcount > 0) {
!                       device_printf(dev, "unregister: channel busy");
			snd_mtxunlock(d->lock);
			return EBUSY;
		}
	}
	if (mixer_uninit(dev)) {
!               device_printf(dev, "unregister: mixer busy");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
--- 416,434 ----

	snd_mtxlock(d->lock);
	if (d->inprog) {
!               device_printf(dev, "unregister: operation in progress\n");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
	SLIST_FOREACH(sce, &d->channels, link) {
		if (sce->channel->refcount > 0) {
!                       device_printf(dev, "unregister: channel busy\n");
			snd_mtxunlock(d->lock);
			return EBUSY;
		}
	}
	if (mixer_uninit(dev)) {
!               device_printf(dev, "unregister: mixer busy\n");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}


*** dsp.c.orig  Fri Feb 15 16:28:58 2002
--- dsp.c       Sat Feb 16 03:33:40 2002
***************
*** 240,265 ****
	/* finished with snddev, new channels still locked */

	/* bump refcounts, reset and unlock any channels that we just opened */
-       if (rdch) {
		if (flags & FREAD) {
			chn_reset(rdch, fmt);
			if (flags & O_NONBLOCK)
				rdch->flags |= CHN_F_NBIO;
-               } else {
-                       CHN_LOCK(rdch);
			pcm_chnref(rdch, 1);
-               }
		CHN_UNLOCK(rdch);
	}
-       if (wrch) {
		if (flags & FWRITE) {
			chn_reset(wrch, fmt);
			if (flags & O_NONBLOCK)
				wrch->flags |= CHN_F_NBIO;
-               } else {
-                       CHN_LOCK(wrch);
			pcm_chnref(wrch, 1);
-               }
		CHN_UNLOCK(wrch);
	}
	splx(s);
--- 240,257 ----
***************
*** 286,316 ****
	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
-                       CHN_UNLOCK(rdch);
			exit = 1;
		}
	}
	if (wrch) {
		CHN_LOCK(wrch);
		if (pcm_chnref(wrch, -1) > 0) {
-                       CHN_UNLOCK(wrch);
			exit = 1;
		}
	}
	if (exit) {
		snd_mtxunlock(d->lock);
		splx(s);
		return 0;
	}
-
	/* both refcounts are zero, abort and release */

	if (d->fakechan)
		d->fakechan->flags = 0;

-       i_dev->si_drv1 = NULL;
-       i_dev->si_drv2 = NULL;
-
	d->flags &= ~SD_F_TRANSIENT;
	snd_mtxunlock(d->lock);

--- 278,310 ----
	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
			exit = 1;
		}
+               CHN_UNLOCK(rdch);
	}
	if (wrch) {
		CHN_LOCK(wrch);
		if (pcm_chnref(wrch, -1) > 0) {
			exit = 1;
		}
+               CHN_UNLOCK(wrch);
	}
+       /* XXX And what happens if one of the channels had 2 references and
+          the other has but one? The latter won't get reset. Can that
+          happen? */
+
	if (exit) {
+               i_dev->si_drv1 = NULL;
+               i_dev->si_drv2 = NULL;
		snd_mtxunlock(d->lock);
		splx(s);
		return 0;
	}
	/* both refcounts are zero, abort and release */

	if (d->fakechan)
		d->fakechan->flags = 0;

	d->flags &= ~SD_F_TRANSIENT;
	snd_mtxunlock(d->lock);

***************
*** 326,331 ****
--- 320,327 ----
		chn_reset(wrch, 0);
		pcm_chnrelease(wrch);
	}
+       i_dev->si_drv1 = NULL;
+       i_dev->si_drv2 = NULL;

	splx(s);
	return 0;
>Release-Note:
>Audit-Trail:

From: Scott Lampert <scott@lampert.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Sat, 15 Jun 2002 11:45:16 -0700

     I had this same issue with 4.5-RELEASE with both a Sound Blaster 
 Live and and an AudioPCI ES1371-A.  This is just a follow-up to note 
 that I've been running with this patch for 2 months and it definately 
 fixes this issue!
 
Responsible-Changed-From-To: freebsd-bugs->sound 
Responsible-Changed-By: dwmalone 
Responsible-Changed-When: Sat Jul 13 12:46:37 PDT 2002 
Responsible-Changed-Why:  
Assign this PR to the sound guys. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=35004 

From: Anish Mistry <amistry@am-productions.yi.org>
To: <freebsd-gnats-submit@FreeBSD.org>, <bill@twwells.com>
Cc:  
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Tue, 16 Jul 2002 23:04:16 -0400 (EDT)

   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.
   Send mail to mime@docserver.cac.washington.edu for more info.
 
 --0-109439507-1026875056=:31631
 Content-Type: TEXT/PLAIN; charset=US-ASCII
 
 I've updated the dsp fix patch for 4.6 I've been running it for a few days
 with the program (mencoder) that was causing the lockups before with out
 the lockups or any other problems :).
 The updated patch is attached.  Also the same patch is paste below:
 
 
 --- dsp.c.orig  Tue Jul 16 22:49:08 2002
 +++ dsp.c       Sat Jul 13 22:43:06 2002
 @@ -283,7 +283,6 @@
         /* finished with snddev, new channels still locked */
 
         /* bump refcounts, reset and unlock any channels that we just
 opened */
 -       if (rdch) {
                 if (flags & FREAD) {
                         if (chn_reset(rdch, fmt)) {
                                 pcm_lock(d);
 @@ -296,13 +295,10 @@
                         }
                         if (flags & O_NONBLOCK)
                                 rdch->flags |= CHN_F_NBIO;
 -               } else
 -                       CHN_LOCK(rdch);
 
                 pcm_chnref(rdch, 1);
                 CHN_UNLOCK(rdch);
 -       }
 -       if (wrch) {
 +               }
                 if (flags & FWRITE) {
                         if (chn_reset(wrch, fmt)) {
                                 pcm_lock(d);
 @@ -315,12 +311,10 @@
                         }
                         if (flags & O_NONBLOCK)
                                 wrch->flags |= CHN_F_NBIO;
 -               } else
 -                       CHN_LOCK(wrch);
 
                 pcm_chnref(wrch, 1);
                 CHN_UNLOCK(wrch);
 -       }
 +               }
         splx(s);
         return 0;
  }
 @@ -345,18 +339,23 @@
         if (rdch) {
                 CHN_LOCK(rdch);
                 if (pcm_chnref(rdch, -1) > 0) {
 -                       CHN_UNLOCK(rdch);
                         exit = 1;
                 }
 +               CHN_UNLOCK(rdch)
         }
         if (wrch) {
                 CHN_LOCK(wrch);
                 if (pcm_chnref(wrch, -1) > 0) {
 -                       CHN_UNLOCK(wrch);
                         exit = 1;
                 }
 +               CHN_UNLOCK(wrch);
         }
 +       /* XXX And what happens if one of the channels had 2 references
 and
 +       the other has but one? The latter won't get reset. Can that
 +       happen? */
         if (exit) {
 +               i_dev->si_drv1 = NULL;
 +               i_dev->si_drv2 = NULL;
                 pcm_unlock(d);
                 splx(s);
                 return 0;
 @@ -367,9 +366,6 @@
         if (pcm_getfakechan(d))
                 pcm_getfakechan(d)->flags = 0;
 
 -       i_dev->si_drv1 = NULL;
 -       i_dev->si_drv2 = NULL;
 -
         dsp_set_flags(i_dev, dsp_get_flags(i_dev) & ~SD_F_TRANSIENT);
         pcm_unlock(d);
 
 @@ -385,6 +381,8 @@
                 chn_reset(wrch, 0);
                 pcm_chnrelease(wrch);
         }
 +       i_dev->si_drv1 = NULL;
 +       i_dev->si_drv2 = NULL;
 
         splx(s);
         return 0;
 
 
 Thanks,
 
 Anish Mistry
 amistry@am-productions.yi.org
 AM Productions http://am-productions.yi.org/
 
 --0-109439507-1026875056=:31631
 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="pcm-lock.patch"
 Content-Transfer-Encoding: BASE64
 Content-ID: <Pine.BSF.4.33.0207162304160.31631@am-productions.yi.org>
 Content-Description: 
 Content-Disposition: attachment; filename="pcm-lock.patch"
 
 LS0tIGRzcC5jLm9yaWcJVHVlIEp1bCAxNiAyMjo0OTowOCAyMDAyDQorKysg
 ZHNwLmMJU2F0IEp1bCAxMyAyMjo0MzowNiAyMDAyDQpAQCAtMjgzLDcgKzI4
 Myw2IEBADQogCS8qIGZpbmlzaGVkIHdpdGggc25kZGV2LCBuZXcgY2hhbm5l
 bHMgc3RpbGwgbG9ja2VkICovDQogDQogCS8qIGJ1bXAgcmVmY291bnRzLCBy
 ZXNldCBhbmQgdW5sb2NrIGFueSBjaGFubmVscyB0aGF0IHdlIGp1c3Qgb3Bl
 bmVkICovDQotCWlmIChyZGNoKSB7DQogCQlpZiAoZmxhZ3MgJiBGUkVBRCkg
 ew0KIAkgICAgICAgIAlpZiAoY2huX3Jlc2V0KHJkY2gsIGZtdCkpIHsNCiAJ
 CQkJcGNtX2xvY2soZCk7DQpAQCAtMjk2LDEzICsyOTUsMTAgQEANCiAJCQl9
 DQogCQkJaWYgKGZsYWdzICYgT19OT05CTE9DSykNCiAJCQkJcmRjaC0+Zmxh
 Z3MgfD0gQ0hOX0ZfTkJJTzsNCi0JCX0gZWxzZQ0KLQkJCUNITl9MT0NLKHJk
 Y2gpOw0KIA0KIAkJcGNtX2NobnJlZihyZGNoLCAxKTsNCiAJIAlDSE5fVU5M
 T0NLKHJkY2gpOw0KLQl9DQotCWlmICh3cmNoKSB7DQorCQl9DQogCQlpZiAo
 ZmxhZ3MgJiBGV1JJVEUpIHsNCiAJICAgICAgICAJaWYgKGNobl9yZXNldCh3
 cmNoLCBmbXQpKSB7DQogCQkJCXBjbV9sb2NrKGQpOw0KQEAgLTMxNSwxMiAr
 MzExLDEwIEBADQogCQkJfQ0KIAkJCWlmIChmbGFncyAmIE9fTk9OQkxPQ0sp
 DQogCQkJCXdyY2gtPmZsYWdzIHw9IENITl9GX05CSU87DQotCQl9IGVsc2UN
 Ci0JCQlDSE5fTE9DSyh3cmNoKTsNCiANCiAJCXBjbV9jaG5yZWYod3JjaCwg
 MSk7DQogCSAJQ0hOX1VOTE9DSyh3cmNoKTsNCi0JfQ0KKwkJfQ0KIAlzcGx4
 KHMpOw0KIAlyZXR1cm4gMDsNCiB9DQpAQCAtMzQ1LDE4ICszMzksMjMgQEAN
 CiAJaWYgKHJkY2gpIHsNCiAJCUNITl9MT0NLKHJkY2gpOw0KIAkJaWYgKHBj
 bV9jaG5yZWYocmRjaCwgLTEpID4gMCkgew0KLQkJCUNITl9VTkxPQ0socmRj
 aCk7DQogCQkJZXhpdCA9IDE7DQogCQl9DQorCQlDSE5fVU5MT0NLKHJkY2gp
 DQogCX0NCiAJaWYgKHdyY2gpIHsNCiAJCUNITl9MT0NLKHdyY2gpOw0KIAkJ
 aWYgKHBjbV9jaG5yZWYod3JjaCwgLTEpID4gMCkgew0KLQkJCUNITl9VTkxP
 Q0sod3JjaCk7DQogCQkJZXhpdCA9IDE7DQogCQl9DQorCQlDSE5fVU5MT0NL
 KHdyY2gpOw0KIAl9DQorCS8qIFhYWCBBbmQgd2hhdCBoYXBwZW5zIGlmIG9u
 ZSBvZiB0aGUgY2hhbm5lbHMgaGFkIDIgcmVmZXJlbmNlcyBhbmQNCisJdGhl
 IG90aGVyIGhhcyBidXQgb25lPyBUaGUgbGF0dGVyIHdvbid0IGdldCByZXNl
 dC4gQ2FuIHRoYXQNCisJaGFwcGVuPyAqLw0KIAlpZiAoZXhpdCkgew0KKwkJ
 aV9kZXYtPnNpX2RydjEgPSBOVUxMOw0KKwkJaV9kZXYtPnNpX2RydjIgPSBO
 VUxMOw0KIAkJcGNtX3VubG9jayhkKTsNCiAJCXNwbHgocyk7DQogCQlyZXR1
 cm4gMDsNCkBAIC0zNjcsOSArMzY2LDYgQEANCiAJaWYgKHBjbV9nZXRmYWtl
 Y2hhbihkKSkNCiAJCXBjbV9nZXRmYWtlY2hhbihkKS0+ZmxhZ3MgPSAwOw0K
 IA0KLQlpX2Rldi0+c2lfZHJ2MSA9IE5VTEw7DQotCWlfZGV2LT5zaV9kcnYy
 ID0gTlVMTDsNCi0NCiAJZHNwX3NldF9mbGFncyhpX2RldiwgZHNwX2dldF9m
 bGFncyhpX2RldikgJiB+U0RfRl9UUkFOU0lFTlQpOw0KIAlwY21fdW5sb2Nr
 KGQpOw0KIA0KQEAgLTM4NSw2ICszODEsOCBAQA0KIAkJY2huX3Jlc2V0KHdy
 Y2gsIDApOw0KIAkJcGNtX2NobnJlbGVhc2Uod3JjaCk7DQogCX0NCisJaV9k
 ZXYtPnNpX2RydjEgPSBOVUxMOw0KKwlpX2Rldi0+c2lfZHJ2MiA9IE5VTEw7
 DQogDQogCXNwbHgocyk7DQogCXJldHVybiAwOw0K
 --0-109439507-1026875056=:31631--

From: "Georg-W. Koltermann" <g.w.k@web.de>
To: freebsd-gnats-submit@FreeBSD.org
Cc: bill@twwells.com
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Sat, 10 Aug 2002 23:13:14 +0200

 I also had this lockup problem, and it was fixed for me as well by 
 applying the patch to dsp.c from this PR.
 
 Could some kind soul please commit the patch? Thnx.
 
 --
 Regards,
 Georg.
 
 

From: Orion Hodson <orion@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org, bill@twwells.com
Cc:  
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Mon, 12 Aug 2002 19:58:32 -0700

 I have reviewed and tested the latest revision of the patch on -stable
 and it looks good.  Testing on -current will have to wait until this
 weekend as my crash box being out on loan.  Hopefully we can MFC in time 
 plenty of time for 4.7R.
 
 Thanks
 - Orion
 

From: Andrew Martin <ugly@inhuman.org>
To: freebsd-gnats-submit@FreeBSD.org, bill@twwells.com
Cc:  
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Thu, 15 Aug 2002 22:39:35 -0400

 I was also experiencing the lockup problem on 4.6-STABLE until I applied
 Anish Mistry's 4.6 patch.  I've been running it for a couple weeks now
 without any lockups or other issues.
 
 -Andrew
State-Changed-From-To: open->suspended 
State-Changed-By: orion 
State-Changed-When: Sun Aug 18 07:17:20 PDT 2002 
State-Changed-Why:  
Patch applied to -CURRENT today.  Expect -STABLE to follow in 3 days 
time.  Apologies for this taking such a long time. 

Thanks 
- Orion 


http://www.freebsd.org/cgi/query-pr.cgi?pr=35004 
State-Changed-From-To: suspended->feedback 
State-Changed-By: orion 
State-Changed-When: Fri Aug 30 09:25:36 PDT 2002 
State-Changed-Why:  
A patch in the spirit of the original submission has been applied to 
both -CURRENT and -STABLE now.  I would appreciate anybody interested 
in this issue testing the patch and confirming they are happy with it. 

The -STABLE version of the file can be found here: 

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/sound/pcm/dsp.c?rev=1.15.2.13&content-type=text/x-cvsweb-markup 

And the diff of the commit here: 

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/sound/pcm/dsp.c.diff?r1=1.15.2.12&r2=1.15.2.13&f=h 

Thanks 
- Orion 


http://www.freebsd.org/cgi/query-pr.cgi?pr=35004 

From: "Ulrich 'Q' Spoerlein" <q@uni.de>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/35004: [PATCH] Fix for pcm driver lockups
Date: Tue, 08 Oct 2002 21:31:34 +0200

 since the patch has been committed to stable, and it fixed the lockups on
 my machine, i think this PR can be closed.
State-Changed-From-To: feedback->closed 
State-Changed-By: orion 
State-Changed-When: Fri Oct 11 06:34:03 PDT 2002 
State-Changed-Why:  
Feedback on 4.7R says this can be closed. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=35004 
>Unformatted:
