From nobody@FreeBSD.org  Sat Sep 22 08:37:25 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4913C106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Sep 2012 08:37:25 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 2A8AB8FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Sep 2012 08:37:25 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id q8M8bOM9064926
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Sep 2012 08:37:24 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id q8M8bO6P064925;
	Sat, 22 Sep 2012 08:37:24 GMT
	(envelope-from nobody)
Message-Id: <201209220837.q8M8bO6P064925@red.freebsd.org>
Date: Sat, 22 Sep 2012 08:37:24 GMT
From: Fabian Keil <fk@fabiankeil.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [geom] g_wither_washer() keeping a core busy
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         171865
>Category:       kern
>Synopsis:       [geom] [patch] g_wither_washer() keeping a core busy
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-geom
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 22 08:40:07 UTC 2012
>Closed-Date:    Mon Apr 01 11:21:02 UTC 2013
>Last-Modified:  Mon Apr 01 11:21:02 UTC 2013
>Originator:     Fabian Keil
>Release:        HEAD
>Organization:
>Environment:
FreeBSD r500.local 10.0-CURRENT FreeBSD 10.0-CURRENT #484 r+345840c: Fri Sep 21 20:20:56 CEST 2012     fk@r500.local:/usr/obj/usr/src/sys/ZOEY  amd64
>Description:
In http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html
I reported a problem with g_wither_washer() being called more than
400000 times per second after a device got lost, keeping a cpu busy:

fk@r500 ~ $sudo dtrace -n 'fbt:kernel:g_*:entry { @[probefunc, stack()] = count(); } tick-1sec { trunc(@, 3); printa(@); trunc(@)}'
dtrace: description 'fbt:kernel:g_*:entry ' matched 359 probes
CPU     ID                    FUNCTION:NAME
  0  32988                       :tick-1sec 
  g_wither_washer                                   
              kernel`g_run_events+0x3b5
              kernel`0xffffffff8084967e
           446626

  0  32988                       :tick-1sec 
  g_trace                                           
              kernel`g_io_request+0x4d
              kernel`g_io_schedule_down+0x25f
              kernel`g_down_procbody+0x6d
              kernel`fork_exit+0x9a
              kernel`0xffffffff8084967e
              230
  g_trace                                           
              kernel`g_io_deliver+0x7a
              kernel`g_up_procbody+0x6d
              kernel`fork_exit+0x9a
              kernel`0xffffffff8084967e
              230
[...]

I recently found a way to reproduce the problem without using
ZFS or writing to the device.
>How-To-Repeat:
geli onetime /dev/md0
geom sched insert -a rr /dev/md0.eli
geli detach /dev/md0.eli.sched.

>Fix:
I don't have a fix, but the attached patch can be used as a workaround.

After kern.geom.debugflags has been set to 256, it can be set to 0 again,
but the problem will be back after the next geom "event".

Patch attached with submission follows:

From 8680caf9ab5322377736f62cd4eb674a938bb445 Mon Sep 17 00:00:00 2001
From: Fabian Keil <fk@fabiankeil.de>
Date: Thu, 12 Jul 2012 12:38:00 +0200
Subject: [PATCH] Allow to use kern.geom.debugflags to prevent g_run_events()
 from calling g_wither_washer()

Workaround for geom keeping a whole core busy failing
to remove a lost device.
---
 sys/geom/geom_event.c | 3 +++
 sys/geom/geom_int.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/sys/geom/geom_event.c b/sys/geom/geom_event.c
index 3805dcd..b9bfc25 100644
--- a/sys/geom/geom_event.c
+++ b/sys/geom/geom_event.c
@@ -47,6 +47,7 @@ __FBSDID("$FreeBSD: src/sys/geom/geom_event.c,v 1.62 2012/07/29 11:51:48 mav Exp
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
+#include <sys/sysctl.h>
 #include <sys/proc.h>
 #include <sys/errno.h>
 #include <sys/time.h>
@@ -286,6 +287,8 @@ g_run_events()
 			;
 		mtx_assert(&g_eventlock, MA_OWNED);
 		*i = g_wither_work;
+		if (g_debugflags & G_F_STOP_WITHERING)
+			*i = 0;
 		if (*i) {
 			mtx_unlock(&g_eventlock);
 			while (*i) {
diff --git a/sys/geom/geom_int.h b/sys/geom/geom_int.h
index 50f3a2a..0c11be8 100644
--- a/sys/geom/geom_int.h
+++ b/sys/geom/geom_int.h
@@ -50,6 +50,7 @@ extern int g_debugflags;
  */
 #define G_F_DISKIOCTL	64
 #define G_F_CTLDUMP	128
+#define G_F_STOP_WITHERING 256
 
 /* geom_dump.c */
 void g_confxml(void *, int flag);
-- 
1.7.11.5



>Release-Note:
>Audit-Trail:

From: Fabian Keil <fk@fabiankeil.de>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/171865: [geom] g_wither_washer() keeping a core busy
Date: Sun, 23 Sep 2012 12:56:23 +0200

 --Sig_/KU5E65A_olXP4OejkTDNCQi
 Content-Type: multipart/mixed; boundary="MP_/v14wCMsERAWZ+BN5MCnuOBj"
 
 --MP_/v14wCMsERAWZ+BN5MCnuOBj
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: inline
 
 The attached patch actually applies against a vanilla tree.
 
 Fabian
 
 --MP_/v14wCMsERAWZ+BN5MCnuOBj
 Content-Type: text/x-patch
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment;
  filename=0001-Allow-to-use-kern.geom.debugflags-to-prevent-g_run_e.diff
 
 =46rom 86f9c2e1f3c49a1f3b699091521e97c268c0e8a5 Mon Sep 17 00:00:00 2001
 From: Fabian Keil <fk@fabiankeil.de>
 Date: Thu, 12 Jul 2012 12:38:00 +0200
 Subject: [PATCH] Allow to use kern.geom.debugflags to prevent g_run_events()
  from calling g_wither_washer()
 
 Workaround for geom keeping a whole core busy failing
 to remove a lost device.
 ---
  sys/geom/geom_event.c | 3 +++
  sys/geom/geom_int.h   | 1 +
  2 files changed, 4 insertions(+)
 
 diff --git a/sys/geom/geom_event.c b/sys/geom/geom_event.c
 index e3b5261..7491bc3 100644
 --- a/sys/geom/geom_event.c
 +++ b/sys/geom/geom_event.c
 @@ -47,6 +47,7 @@ __FBSDID("$FreeBSD: src/sys/geom/geom_event.c,v 1.62 2012=
 /07/29 11:51:48 mav Exp
  #include <sys/kernel.h>
  #include <sys/lock.h>
  #include <sys/mutex.h>
 +#include <sys/sysctl.h>
  #include <sys/proc.h>
  #include <sys/errno.h>
  #include <sys/time.h>
 @@ -281,6 +282,8 @@ g_run_events()
  			;
  		mtx_assert(&g_eventlock, MA_OWNED);
  		i =3D g_wither_work;
 +		if (g_debugflags & G_F_STOP_WITHERING)
 +			i =3D 0;
  		if (i) {
  			mtx_unlock(&g_eventlock);
  			while (i) {
 diff --git a/sys/geom/geom_int.h b/sys/geom/geom_int.h
 index 50f3a2a..0c11be8 100644
 --- a/sys/geom/geom_int.h
 +++ b/sys/geom/geom_int.h
 @@ -50,6 +50,7 @@ extern int g_debugflags;
   */
  #define G_F_DISKIOCTL	64
  #define G_F_CTLDUMP	128
 +#define G_F_STOP_WITHERING 256
 =20
  /* geom_dump.c */
  void g_confxml(void *, int flag);
 --=20
 1.7.11.5
 
 
 --MP_/v14wCMsERAWZ+BN5MCnuOBj--
 
 --Sig_/KU5E65A_olXP4OejkTDNCQi
 Content-Type: application/pgp-signature; name=signature.asc
 Content-Disposition: attachment; filename=signature.asc
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (FreeBSD)
 
 iEYEARECAAYFAlBe6ukACgkQSMVSH78upWPNlgCgggUNpxSe71/g7mMol0cDU1+6
 rgUAn0u3axvgaA+dkbHFPs/NMVlPobMt
 =lsto
 -----END PGP SIGNATURE-----
 
 --Sig_/KU5E65A_olXP4OejkTDNCQi--

From: Jaakko Heinonen <jh@FreeBSD.org>
To: Fabian Keil <fk@fabiankeil.de>
Cc: bug-followup@FreeBSD.org, luigi@FreeBSD.org
Subject: Re: kern/171865: [geom] g_wither_washer() keeping a core busy
Date: Tue, 25 Sep 2012 17:06:54 +0300

 On 2012-09-22, Fabian Keil wrote:
 > I recently found a way to reproduce the problem without using
 > ZFS or writing to the device.
 > >How-To-Repeat:
 > geli onetime /dev/md0
 > geom sched insert -a rr /dev/md0.eli
 > geli detach /dev/md0.eli.sched.
 
 It seems that if you "insert" a sched geom and do "geli detach" on it,
 the geli geom can't be destroyed.
 
 After your commands "md0.eli" still exists:
 
 # geli list
 Geom name: md0.eli
 Providers:
 1. Name: md0.eli
    Mediasize: 10485760 (10M)
    Sectorsize: 512
    Mode: r0w0e0
 # geli detach md0.eli
 geli: No such device: md0.eli.
 
 I didn't find a way to destroy it. I suspect a geom_sched bug. luigi@
 cc'd.
 
 -- 
 Jaakko

From: Fabian Keil <fk@fabiankeil.de>
To: Jaakko Heinonen <jh@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, luigi@FreeBSD.org
Subject: Re: kern/171865: [geom] g_wither_washer() keeping a core busy
Date: Wed, 26 Sep 2012 17:41:16 +0200

 --Sig_/9Up1jT9Q9Fv4+b.oSjPAz2=
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable
 
 Jaakko Heinonen <jh@FreeBSD.org> wrote:
 
 > On 2012-09-22, Fabian Keil wrote:
 > > I recently found a way to reproduce the problem without using
 > > ZFS or writing to the device.
 > > >How-To-Repeat:
 > > geli onetime /dev/md0
 > > geom sched insert -a rr /dev/md0.eli
 > > geli detach /dev/md0.eli.sched.
 >=20
 > It seems that if you "insert" a sched geom and do "geli detach" on it,
 > the geli geom can't be destroyed.
 >=20
 > After your commands "md0.eli" still exists:
 =20
 > I didn't find a way to destroy it. I suspect a geom_sched bug. luigi@
 > cc'd.
 
 While I can't rule out a geom_sched bug, I usually run into the
 problem while only using glabel+geli+ZFS on an USB device that
 disappears as described in the initial report at:
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-June/011855.html
 
 It's just less convenient to reproduce as it requires more steps
 and the disappearance can also lead to panics like these:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162010
 http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036
 
 Fabian
 
 --Sig_/9Up1jT9Q9Fv4+b.oSjPAz2=
 Content-Type: application/pgp-signature; name=signature.asc
 Content-Disposition: attachment; filename=signature.asc
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (FreeBSD)
 
 iEYEARECAAYFAlBjIiAACgkQSMVSH78upWM/pQCfd7TY7/GOblu08UXFUzF2XDNP
 Y9gAnjXvGj4MkFFGmamXTlsP6mkwiGiJ
 =XXQN
 -----END PGP SIGNATURE-----
 
 --Sig_/9Up1jT9Q9Fv4+b.oSjPAz2=--
Responsible-Changed-From-To: freebsd-bugs->freebsd-geom 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Oct 6 03:25:16 UTC 2012 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=171865 
State-Changed-From-To: open->closed 
State-Changed-By: mav 
State-Changed-When: Mon Apr 1 11:19:17 UTC 2013 
State-Changed-Why:  
r248674 fixed the problem, making g_wither_washer() to be rerun only after 
some more changes in GEOM topology. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=171865 
>Unformatted:
