From mm@FreeBSD.org  Thu Jun  9 14:10:50 2011
Return-Path: <mm@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4173A106564A
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  9 Jun 2011 14:10:50 +0000 (UTC)
	(envelope-from mm@mail.vx.sk)
Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3])
	by mx1.freebsd.org (Postfix) with ESMTP id 9B8038FC08
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  9 Jun 2011 14:10:49 +0000 (UTC)
Received: from core.vx.sk (localhost [127.0.0.1])
	by mail.vx.sk (Postfix) with ESMTP id 5F8A6175C1F
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  9 Jun 2011 16:10:48 +0200 (CEST)
Received: from mail.vx.sk ([127.0.0.1])
	by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id eqy42TrLpHtm for <FreeBSD-gnats-submit@freebsd.org>;
	Thu,  9 Jun 2011 16:10:46 +0200 (CEST)
Received: by mail.vx.sk (Postfix, from userid 1001)
	id 6FC2F175C1A; Thu,  9 Jun 2011 16:10:45 +0200 (CEST)
Message-Id: <20110609141046.6FC2F175C1A@mail.vx.sk>
Date: Thu,  9 Jun 2011 16:10:45 +0200 (CEST)
From: Martin Matuska <mm@FreeBSD.org>
Reply-To: Martin Matuska <mm@FreeBSD.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: zfs (v28) incremental receive may leave behind temporary clones
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         157728
>Category:       kern
>Synopsis:       [zfs] zfs (v28) incremental receive may leave behind temporary clones
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-fs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jun 09 14:20:08 UTC 2011
>Closed-Date:    Mon Aug 22 10:15:18 UTC 2011
>Last-Modified:  Mon Aug 22 10:15:18 UTC 2011
>Originator:     Martin Matuska
>Release:        FreeBSD 8.2-STABLE amd64
>Organization:
>Environment:
System: FreeBSD 8.2-STABLE #2 r222851M: Wed Jun  8 07:24:58 CEST 2011
>Description:
	zfs receive (v28) may leave behind temporary clones if doing
	a parallel zfs list on an incremental snapshot being received.

	The temporary clone gets properly removed in Nexenta and OpenIndiana
	(but not immediately after zfs receive finishes - there is
	a small delay).

	In FreeBSD, something prevents the process/thread responsible for
	removing the clone doing its job.
>How-To-Repeat:
Script:
#!/bin/sh
zpool destroy test
if [ ! -f "/tmp/testfile" ]; then
	dd if=/dev/zero of=/tmp/testfile bs=1M count=250
fi
zpool create test /tmp/testfile
zfs create test/d1
#dd if=/dev/zero of=/test/d1/file1 bs=1M count=100
zfs snapshot test/d1@s1
#rm /test/d1/file1
zfs send test/d1@s1 | zfs recv test/d2
zfs snapshot test/d1@s2
( sleep 1; zfs send -I @s1 test/d1@s2 | zfs recv test/d2) &
while test "$OK" != "1"; do
zfs list -H test/d2@s2 >/dev/null 2>/dev/null && OK=1
done
zfs destroy test/d2@s1
zdb -d test | grep %

Result:
cannot destroy 'test/d2@s1': dataset already exists
Could not open test/d2/%s2, error 16
>Fix:
For now only a workaround:
- do not do zfs list
or
- destroy temporary clone afterwards (zfs destroy test/d2/%s2)
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Jun 9 19:04:18 UTC 2011 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=157728 

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org, mm@FreeBSD.org
Cc:  
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind
 temporary clones
Date: Sat, 16 Jul 2011 16:13:12 +0200

 I have debugged this a little and have now more information.
 The snapshot is set for deferred destroy but the temporary clone does
 not get deleted, because of an extra hold:
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c: #3558,
 zfs_ioc_recv():
 end_err = dmu_recv_end(&drc);
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: #1621,
 dmu_recv_end():
 return (dmu_recv_existing_end(drc));
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c: #1581,
 dmu_recv_existing_end():
 (void) dsl_dataset_destroy(drc->drc_real_ds, dmu_recv_tag, B_FALSE);
 
 This dataset does not get destroyed , error EBUSY.
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: #1158,
 dsl_dataset_destroy():
 
 dstg = dsl_sync_task_group_create(ds->ds_dir->dd_pool);
 dsl_sync_task_create(dstg, dsl_dataset_destroy_check,
     dsl_dataset_destroy_sync, &dsda, tag, 0);
 dsl_sync_task_create(dstg, dsl_dir_destroy_check,
      dsl_dir_destroy_sync, &dummy_ds, FTAG, 0);
 err = dsl_sync_task_group_wait(dstg);
 dsl_sync_task_group_destroy(dstg);
 
 dsl_sync_task_group_wait() calls:
 - dsl_dataset_destroy_check (returns 0)
 - dsl_dir_destroy_check (returns EBUSY, should return 0) <-- error comes
 from here
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c: #466,
 dsl_dir_destroy_check():
 if (dmu_buf_refcount(dd->dd_dbuf) > 2)
     return (EBUSY); <--- EBUSY comes from here
 
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c: #2101
 #pragma weak dmu_buf_refcount = dbuf_refcount
 uint64_t
 dbuf_refcount(dmu_buf_impl_t *db)
 {
         return (refcount_count(&db->db_holds));
 }
 
 If we issue zfs list or zfs get (recursive or on the dataset) the
 db->db_holds for the clone has a value of 3, otherwise a value of 2.
 With 3, destroying the temporary clone fails and deferred destroy of the
 snapshot fails, too.
 Looks like a extra hold is placed on the temporary clone.
 
 -- 
 Martin Matuska
 FreeBSD committer
 http://blog.vx.sk
 

From: Borja Marcos <borjam@sarenet.es>
To: bug-followup@FreeBSD.org,
 mm@FreeBSD.org
Cc:  
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Date: Thu, 4 Aug 2011 12:39:58 +0200

 I have a clue. I've tried a partial fix and so far seems to work. Now I =
 have a loop doing zfs sends of a dataset with a make buildworld  =
 running, each 30 seconds, and receiving them onto a different pool, on =
 which I have a while ( 1 ) ; zfs list ; end loop running.
 
 So far I haven't had issues. The only side effect is that temporary =
 datasets can appear in the zfs list output.=20
 
 Read below for the explanation.
 
 After reading Martin's analysis, seemed quite clear to me that the =
 scenario was due to the necessity of getting a consistent snapshot of =
 the state of a complex data structure. In this case, I imagined that the =
 "list" service would traverse the data structures holding the datasets =
 descriptions, and that it would place temporary locks on the elements in =
 order to prevent them from being altered while the structure is being =
 traversed.
 
 So, a generic "list" service in a fine-grained locking environment and =
 rendering a consistent response would be something like that:
 
 - traverse data structure, building a list.
   (each time we get an element, a temporary lock is placed on it)
 - get next element, etc.
 
 - With the complete and consistent list ready, prepare the response.
 
 - Once the response has been built, traverse the grabbed results and =
 release the locks.
 
 
 So, where's the problem? In the special treatment of the "hidden" =
 datasets.
 
 Looking at =
 /usr/src/sys/cddl/contrib/opensolaris/common/fs/zfs/zfs_ioctl.c, at the =
 function zfs_ioc_dataset_list_next(zfs_cmd_t *zc)
 
 I see something resembling this idea:
 
 while (error =3D=3D 0 && dataset_name_hidden(zc->zc_name) &&
             !(zc->zc_iflags & FKIOCTL));
         dmu_objset_rele(os, FTAG);
 
 So, wondering if the problem is this, giving a special treatment to the =
 hidden dataset, I've edited the dataset_name_hidden() function so that =
 it ignores the "%" datasets.
 
 boolean_t
 dataset_name_hidden(const char *name)
 {
         /*
          * Skip over datasets that are not visible in this zone,
          * internal datasets (which have a $ in their name), and
          * temporary datasets (which have a % in their name).
          */
         if (strchr(name, '$') !=3D NULL)
                 return (B_TRUE);
 /*      if (strchr(name, '%') !=3D NULL)
                 return (B_TRUE); */
         if (!INGLOBALZONE(curthread) && !zone_dataset_visible(name, =
 NULL))
                 return (B_TRUE);
         return (B_FALSE);
 }
                =20
 
 I was expecting just a side-effect: a "zfs list" would list the =
 "%"datasets.
 
 Done this, I've compiled the kernel, started the test again, and, voila! =
 it works.
 
 Of course, now I see the "%" datasets while the zfs receive is running,
 
 pruebazfs3# zfs list -t all
 NAME                            USED  AVAIL  REFER  MOUNTPOINT
 rpool                          1.22G  6.61G  41.3K  /rpool
 rpool/newsrc                   1.22G  6.61G   565M  /rpool/newsrc
 rpool/newsrc@anteshidden        149M      -   973M  -
 rpool/newsrc@parcheteoria1     1.09M      -   973M  -
 rpool/newsrc@20110804_113700       0      -   565M  -
 rpool/newsrc/%20110804_113730  1.31M  6.61G   566M  =
 /rpool/newsrc/%20110804_113730
 
 
 but after zfs receive finishes they are correctly cleaned up
 
 NAME                           USED  AVAIL  REFER  MOUNTPOINT
 rpool                         1.22G  6.61G  41.3K  /rpool
 rpool/newsrc                  1.22G  6.61G   566M  /rpool/newsrc
 rpool/newsrc@anteshidden       149M      -   973M  -
 rpool/newsrc@parcheteoria1    1.09M      -   973M  -
 rpool/newsrc@20110804_113730      0      -   566M  -
 
 
 So: Seems to me that these datasets are a sort of afterthought. The =
 ioctl "list" service should not discard them when building the dataset =
 list. Instead it should not "print" them, so to speak.
 
 I'm sure this temporary fix can be refined, and I'm wondering if a =
 similar issue is lurking somewhere else....
 
 
 
 
 
 
 Borja.
 
 

From: Martin Matuska <mm@FreeBSD.org>
To: Borja Marcos <borjam@sarenet.es>
Cc: bug-followup@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind
 temporary clones
Date: Thu, 04 Aug 2011 14:33:45 +0200

 This is a multi-part message in MIME format.
 --------------090908080609040403050308
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 That is not a solution, we want hidden datasets :)
 
 A workaround patch is attached that does not prefetch hidden datasets in
 zfs (btw. why should we do that at all).
 It doesn't cure the source of the problem but the symptoms - to
 reproduce the problem you have to run zfs list or get directly on the
 invisible temporary clone now.
 
 Please test.
 
 Dňa 04.08.2011 12:39, Borja Marcos wrote / napísal(a):
 > I have a clue. I've tried a partial fix and so far seems to work. Now I have a loop doing zfs sends of a dataset with a make buildworld  running, each 30 seconds, and receiving them onto a different pool, on which I have a while ( 1 ) ; zfs list ; end loop running.
 >
 > So far I haven't had issues. The only side effect is that temporary datasets can appear in the zfs list output. 
 >
 > Read below for the explanation.
 >
 > After reading Martin's analysis, seemed quite clear to me that the scenario was due to the necessity of getting a consistent snapshot of the state of a complex data structure. In this case, I imagined that the "list" service would traverse the data structures holding the datasets descriptions, and that it would place temporary locks on the elements in order to prevent them from being altered while the structure is being traversed.
 >
 > So, a generic "list" service in a fine-grained locking environment and rendering a consistent response would be something like that:
 >
 > - traverse data structure, building a list.
 >   (each time we get an element, a temporary lock is placed on it)
 > - get next element, etc.
 >
 > - With the complete and consistent list ready, prepare the response.
 >
 > - Once the response has been built, traverse the grabbed results and release the locks.
 >
 >
 > So, where's the problem? In the special treatment of the "hidden" datasets.
 >
 > Looking at /usr/src/sys/cddl/contrib/opensolaris/common/fs/zfs/zfs_ioctl.c, at the function zfs_ioc_dataset_list_next(zfs_cmd_t *zc)
 >
 > I see something resembling this idea:
 >
 > while (error == 0 && dataset_name_hidden(zc->zc_name) &&
 >             !(zc->zc_iflags & FKIOCTL));
 >         dmu_objset_rele(os, FTAG);
 >
 > So, wondering if the problem is this, giving a special treatment to the hidden dataset, I've edited the dataset_name_hidden() function so that it ignores the "%" datasets.
 >
 > boolean_t
 > dataset_name_hidden(const char *name)
 > {
 >         /*
 >          * Skip over datasets that are not visible in this zone,
 >          * internal datasets (which have a $ in their name), and
 >          * temporary datasets (which have a % in their name).
 >          */
 >         if (strchr(name, '$') != NULL)
 >                 return (B_TRUE);
 > /*      if (strchr(name, '%') != NULL)
 >                 return (B_TRUE); */
 >         if (!INGLOBALZONE(curthread) && !zone_dataset_visible(name, NULL))
 >                 return (B_TRUE);
 >         return (B_FALSE);
 > }
 >                 
 >
 > I was expecting just a side-effect: a "zfs list" would list the "%"datasets.
 >
 > Done this, I've compiled the kernel, started the test again, and, voila! it works.
 >
 > Of course, now I see the "%" datasets while the zfs receive is running,
 >
 > pruebazfs3# zfs list -t all
 > NAME                            USED  AVAIL  REFER  MOUNTPOINT
 > rpool                          1.22G  6.61G  41.3K  /rpool
 > rpool/newsrc                   1.22G  6.61G   565M  /rpool/newsrc
 > rpool/newsrc@anteshidden        149M      -   973M  -
 > rpool/newsrc@parcheteoria1     1.09M      -   973M  -
 > rpool/newsrc@20110804_113700       0      -   565M  -
 > rpool/newsrc/%20110804_113730  1.31M  6.61G   566M  /rpool/newsrc/%20110804_113730
 >
 >
 > but after zfs receive finishes they are correctly cleaned up
 >
 > NAME                           USED  AVAIL  REFER  MOUNTPOINT
 > rpool                         1.22G  6.61G  41.3K  /rpool
 > rpool/newsrc                  1.22G  6.61G   566M  /rpool/newsrc
 > rpool/newsrc@anteshidden       149M      -   973M  -
 > rpool/newsrc@parcheteoria1    1.09M      -   973M  -
 > rpool/newsrc@20110804_113730      0      -   566M  -
 >
 >
 > So: Seems to me that these datasets are a sort of afterthought. The ioctl "list" service should not discard them when building the dataset list. Instead it should not "print" them, so to speak.
 >
 > I'm sure this temporary fix can be refined, and I'm wondering if a similar issue is lurking somewhere else....
 -- 
 Martin Matuska
 FreeBSD committer
 http://blog.vx.sk
 
 
 --------------090908080609040403050308
 Content-Type: text/x-patch;
  name="zfs_ioctl.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="zfs_ioctl.c.patch"
 
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
 ===================================================================
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	(revision 224648)
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	(working copy)
 @@ -1963,8 +1963,13 @@ zfs_ioc_dataset_list_next()
  		uint64_t cookie = 0;
  		int len = sizeof (zc->zc_name) - (p - zc->zc_name);
  
 -		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0)
 -			(void) dmu_objset_prefetch(zc->zc_name, NULL);
 +		while (dmu_dir_list_next(os, len, p, NULL,
 +		    &cookie) == 0) {
 +			if (dataset_name_hidden(zc->zc_name) == B_FALSE) {
 +				(void) dmu_objset_prefetch(zc->zc_name,
 +				    NULL);
 +			}
 +		}
  	}
  
  	do {
 
 --------------090908080609040403050308--

From: Borja Marcos <borjam@sarenet.es>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org,
 Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Date: Thu, 4 Aug 2011 15:40:03 +0200

 On Aug 4, 2011, at 2:33 PM, Martin Matuska wrote:
 
 > That is not a solution, we want hidden datasets :)
 >=20
 > A workaround patch is attached that does not prefetch hidden datasets =
 in
 > zfs (btw. why should we do that at all).
 > It doesn't cure the source of the problem but the symptoms - to
 > reproduce the problem you have to run zfs list or get directly on the
 > invisible temporary clone now.
 
 Well, still there might be a subtle problem.
 
 I mean, and sorry if it's a somewhat trivial question, but it-s the =
 first time I actually read some ZFS internals code ;)
 
 Does that prefetch *imply* a temporary lock being placed? I mean, in =
 such a case usually you need an atomic fetch-and-lock
 operation. I'm wondering if not prefetching them could be a problem, and =
 instead it would be a better solution to keep prefetching them
 but avoiding to display them, so that any side effects are preserved. =
 Otherwise that might have some ugly interaction.
 
 Of course my patch isn't a solution, I wanted a quick experiment to find =
 out if the special treatment of the hidden datasets was the issue. But, =
 really, the decision not to show a hidden dataset shouldn't be made at a =
 such low level because of these interactions. The problem is, the patch =
 might work but introduce harder to reproduce issues?
 
 Maybe Pawel can help us, I guess he's much more familiar than us with =
 the guts of ZFS ;)
 
 
 
 
 
 Borja.
 

From: Borja Marcos <borjam@sarenet.es>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org,
 Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Date: Thu, 4 Aug 2011 15:40:47 +0200

 On Aug 4, 2011, at 2:33 PM, Martin Matuska wrote:
 
 > That is not a solution, we want hidden datasets :)
 >=20
 > A workaround patch is attached that does not prefetch hidden datasets =
 in
 > zfs (btw. why should we do that at all).
 > It doesn't cure the source of the problem but the symptoms - to
 > reproduce the problem you have to run zfs list or get directly on the
 > invisible temporary clone now.
 
 And, besides, shouldn't this be coordinated with the rest of the ZFS =
 community?=20
 
 
 
 
 Borja.
 

From: Martin Matuska <mm@FreeBSD.org>
To: Borja Marcos <borjam@sarenet.es>
Cc: bug-followup@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind
 temporary clones
Date: Thu, 04 Aug 2011 16:18:48 +0200

 What is for sure that there is an additional lock placed on the clone
 that is not removed, or at least not immediately.
 The fact that you are able to delete the clone afterwards means that the
 lock has been released - it might be a race between tasks.
 In my opinion the lock is placed on any access on the temporary clone
 (no mater if prefetch or fetch).
 But I still think we don't have to prefetch data we are not processing.
 
 I don't think that there will be any ugly interaction in this case.
 The idea of the prefetch code is to speed up access to the data
 structure by caching it into memory.
 So what we don't prefetch (is not cached) will be read the normal way
 (and not from cache).
 
 If you follow its history, you can see it well:
 
 Prefetch for zfs list was introduced in OpenSolaris changeset 8415 and
 didn't change very much since that point:
 http://hg.openindiana.org/illumos-gate/rev/d5525cd1cbc2
 
 If you remove that code, it will still work the way it should, but slower :)
 I still see no problem in not-prefetching hidden datasets.
 
 Dňa 04.08.2011 15:40, Borja Marcos wrote / napísal(a):
 > On Aug 4, 2011, at 2:33 PM, Martin Matuska wrote:
 >
 >
 > Well, still there might be a subtle problem.
 >
 > I mean, and sorry if it's a somewhat trivial question, but it-s the first time I actually read some ZFS internals code ;)
 >
 > Does that prefetch *imply* a temporary lock being placed? I mean, in such a case usually you need an atomic fetch-and-lock
 > operation. I'm wondering if not prefetching them could be a problem, and instead it would be a better solution to keep prefetching them
 > but avoiding to display them, so that any side effects are preserved. Otherwise that might have some ugly interaction.
 >
 > Of course my patch isn't a solution, I wanted a quick experiment to find out if the special treatment of the hidden datasets was the issue. But, really, the decision not to show a hidden dataset shouldn't be made at a such low level because of these interactions. The problem is, the patch might work but introduce harder to reproduce issues?
 >
 > Maybe Pawel can help us, I guess he's much more familiar than us with the guts of ZFS ;)
 -- 
 Martin Matuska
 FreeBSD committer
 http://blog.vx.sk
 

From: Borja Marcos <borjam@sarenet.es>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org,
 Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Date: Thu, 4 Aug 2011 17:37:26 +0200

 On Aug 4, 2011, at 4:18 PM, Martin Matuska wrote:
 
 > But I still think we don't have to prefetch data we are not =
 processing.
 >=20
 > I don't think that there will be any ugly interaction in this case.
 > The idea of the prefetch code is to speed up access to the data
 > structure by caching it into memory.
 > So what we don't prefetch (is not cached) will be read the normal way
 > (and not from cache).
 >=20
 > If you follow its history, you can see it well:
 >=20
 > Prefetch for zfs list was introduced in OpenSolaris changeset 8415 and
 > didn't change very much since that point:
 > http://hg.openindiana.org/illumos-gate/rev/d5525cd1cbc2
 >=20
 > If you remove that code, it will still work the way it should, but =
 slower :)
 > I still see no problem in not-prefetching hidden datasets.
 
 Understood :) Thank you very much. As I said, I'm not that familiar with =
 the internals.
 
 I'm going to try the patch and will let you know the outcome. I guess  =
 it will effectively fix it.
 
 
 
 Best regards,
 
 
 
 
 Borja.
 

From: Borja Marcos <borjam@sarenet.es>
To: Martin Matuska <mm@FreeBSD.org>
Cc: freebsd-fs@FreeBSD.org,
 bug-followup@FreeBSD.org
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Date: Fri, 5 Aug 2011 11:15:07 +0200

 	=09
 On Aug 4, 2011, at 4:20 PM, Martin Matuska wrote:
 
 > If you remove that code, it will still work the way it should, but =
 slower :)
 > I still see no problem in not-prefetching hidden datasets.
 
 Yep, the patch seems to work perfectly. I've been trying to trigger the =
 issue and no way, seems to be solved.
 
 
 
 Borja.
 

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind
 temporary clones
Date: Wed, 10 Aug 2011 21:08:36 +0200

 This is a multi-part message in MIME format.
 --------------020705050701020904030304
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 
 I have now an alternative patch (does almost the same, but at different
 place).
 
 -- 
 Martin Matuska
 FreeBSD committer
 http://blog.vx.sk
 
 
 --------------020705050701020904030304
 Content-Type: text/plain;
  name="dmu_objset.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="dmu_objset.c.patch"
 
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c
 ===================================================================
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c	(revision 224760)
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c	(working copy)
 @@ -1760,10 +1760,29 @@
  dmu_objset_prefetch(const char *name, void *arg)
  {
  	dsl_dataset_t *ds;
 +	char *cp;
  
 +	/*
 +	 * If the objset starts with a '%', then ignore it.
 +	 * These hidden datasets are always inconsistent and by not opening
 +	 * them here, we can avoid a race with dsl_dir_destroy_check().
 +	 */
 +	cp = strrchr(name, '/');
 +	if (cp && cp[1] == '%')
 +		return (0);
 +
  	if (dsl_dataset_hold(name, FTAG, &ds))
  		return (0);
  
 +	/*
 +	 * If the objset is in an inconsistent state (eg, in the process
 +	 * of being destroyed), don't prefetch it.
 +	 */
 +	if (ds->ds_phys->ds_flags & DS_FLAG_INCONSISTENT) {
 +		dsl_dataset_rele(ds, FTAG);
 +		return (0);
 +	}
 +
  	if (!BP_IS_HOLE(&ds->ds_phys->ds_bp)) {
  		mutex_enter(&ds->ds_opening_lock);
  		if (ds->ds_objset == NULL) {
 
 --------------020705050701020904030304--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/157728: commit references a PR
Date: Sat, 13 Aug 2011 10:59:07 +0000 (UTC)

 Author: mm
 Date: Sat Aug 13 10:58:53 2011
 New Revision: 224814
 URL: http://svn.freebsd.org/changeset/base/224814
 
 Log:
   Fix race between dmu_objset_prefetch() invoked from
   zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly
   invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not
   prefetching temporary clones, as these count as always inconsistent.
   In addition, do not prefetch hidden datasets at all as we are not
   going to process these later.
   
   Filed as Illumos Bug #1346
   
   PR:		kern/157728
   Tested by:	Borja Marcos <borjam@sarenet.es>, mm
   Reviewed by:	pjd
   Approved by:	re (kib)
   MFC after:	1 week
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	Sat Aug 13 10:43:56 2011	(r224813)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	Sat Aug 13 10:58:53 2011	(r224814)
 @@ -1964,7 +1964,8 @@ top:
  		int len = sizeof (zc->zc_name) - (p - zc->zc_name);
  
  		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0)
 -			(void) dmu_objset_prefetch(zc->zc_name, NULL);
 +			if (dataset_name_hidden(zc->zc_name) == B_FALSE)
 +				(void) dmu_objset_prefetch(zc->zc_name, NULL);
  	}
  
  	do {
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/157728: commit references a PR
Date: Sat, 20 Aug 2011 07:43:25 +0000 (UTC)

 Author: mm
 Date: Sat Aug 20 07:43:10 2011
 New Revision: 225022
 URL: http://svn.freebsd.org/changeset/base/225022
 
 Log:
   MFC r224814, r224855:
   
   MFC r224814 [1]:
   Fix race between dmu_objset_prefetch() invoked from
   zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly
   invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not
   prefetching temporary clones, as these count as always inconsistent.
   In addition, do not prefetch hidden datasets at all as we are not
   going to process these later.
   
   Filed as Illumos Bug #1346
   
   MFC r224855:
   zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next()
   
   zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors()
   by passing full instead of relative dataset name and prefetching all
   visible datasets to be processed later instead of just the pool name
   
   PR:		kern/157728 [1]
   Tested by:	Borja Marcos <borjam@sarenet.es> [1], mm
   Reviewed by:	pjd
 
 Modified:
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	Sat Aug 20 06:08:31 2011	(r225021)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	Sat Aug 20 07:43:10 2011	(r225022)
 @@ -1963,8 +1963,10 @@ top:
  		uint64_t cookie = 0;
  		int len = sizeof (zc->zc_name) - (p - zc->zc_name);
  
 -		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0)
 -			(void) dmu_objset_prefetch(zc->zc_name, NULL);
 +		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) {
 +			if (!dataset_name_hidden(zc->zc_name))
 +				(void) dmu_objset_prefetch(zc->zc_name, NULL);
 +		}
  	}
  
  	do {
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c	Sat Aug 20 06:08:31 2011	(r225021)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c	Sat Aug 20 07:43:10 2011	(r225022)
 @@ -2200,11 +2200,11 @@ zvol_create_minors(const char *name)
  	p = osname + strlen(osname);
  	len = MAXPATHLEN - (p - osname);
  
 -	if (strchr(name, '/') == NULL) {
 -		/* Prefetch only for pool name. */
 -		cookie = 0;
 -		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0)
 -			(void) dmu_objset_prefetch(p, NULL);
 +	/* Prefetch the datasets. */
 +	cookie = 0;
 +	while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) {
 +		if (!dataset_name_hidden(osname))
 +			(void) dmu_objset_prefetch(osname, NULL);
  	}
  
  	cookie = 0;
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: mm 
State-Changed-When: Mon Aug 22 10:15:16 UTC 2011 
State-Changed-Why:  
Resolved. Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=157728 
>Unformatted:
