From nobody@FreeBSD.org  Mon Oct 24 15:13:54 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B3216106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 24 Oct 2011 15:13:54 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id A1A6B8FC0A
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 24 Oct 2011 15:13:54 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p9OFDsxd043580
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 24 Oct 2011 15:13:54 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p9OFDshs043546;
	Mon, 24 Oct 2011 15:13:54 GMT
	(envelope-from nobody)
Message-Id: <201110241513.p9OFDshs043546@red.freebsd.org>
Date: Mon, 24 Oct 2011 15:13:54 GMT
From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         161968
>Category:       kern
>Synopsis:       [zfs] [hang] renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 24 15:20:00 UTC 2011
>Closed-Date:    
>Last-Modified:  Wed Jun 19 14:40:00 UTC 2013
>Originator:     Peter Maloney
>Release:        8.2-STABLE FreeBSD 8.2-STABLE #0: Tue Sep 27 16:27:57 CEST 2011     root@bcnastest2.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
>Organization:
Brockmann Consult
>Environment:
FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:03 CEST 2011     root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup/deadlock. 

After it is locked up, any command using "zfs" "zpool" "sysctl -a", or NFS exports will freeze. And "shutdown -r" will not restart the system, only shut it down until it says the disks are all synced.

CTRL+T done after zfs or zpool shows state "spa_namespace_lock". Done after "sysctl -a" shows state "g_waitfor_event".

Most of the time, a simple "zfs rename" does not cause a lockup, however with a specific snapshot on one system, renaming it always causes a lockup, and on every other 8-STABLE system I have, my script always causes a lockup after a few loops.

My FreeBSD 8-STABLE was installed as 8.2 release plus the mps driver, and then cvsup using this cvsupfile (removed comments):

*default host=cvsup.de.FreeBSD.org
*default base=/var/db
*default prefix=/usr
*default release=cvs tag=RELENG_8
*default delete use-rel-suffix
*default date=2011.09.27.00.00.00
*default compress
src-all

(and the same freeze result occurs with date changed to today, Oct. 24th)

# zpool get all big
NAME  PROPERTY       VALUE       SOURCE
big   size           39.8G       -
big   capacity       24%         -
big   altroot        -           default
big   health         ONLINE      -
big   guid           14576708073682355899  default
big   version        28          default
big   bootfs         -           default
big   delegation     on          default
big   autoreplace    on          local
big   cachefile      -           default
big   failmode       continue    local
big   listsnapshots  on          local
big   autoexpand     off         default
big   dedupditto     0           default
big   dedupratio     1.00x       -
big   free           30.1G       -
big   allocated      9.64G       -
big   readonly       off         -

# zfs get all big
NAME  PROPERTY              VALUE                  SOURCE
big   type                  filesystem             -
big   creation              Thu Jul 21 11:48 2011  -
big   used                  4.80G                  -
big   available             14.7G                  -
big   referenced            4.80G                  -
big   compressratio         1.00x                  -
big   mounted               yes                    -
big   quota                 none                   default
big   reservation           none                   default
big   recordsize            128K                   default
big   mountpoint            /big                   default
big   sharenfs              off                    default
big   checksum              on                     default
big   compression           off                    default
big   atime                 on                     default
big   devices               on                     default
big   exec                  on                     default
big   setuid                on                     default
big   readonly              off                    default
big   jailed                off                    default
big   snapdir               visible                local
big   aclmode               discard                default
big   aclinherit            restricted             default
big   canmount              on                     default
big   xattr                 off                    temporary
big   copies                1                      default
big   version               4                      -
big   utf8only              off                    -
big   normalization         none                   -
big   casesensitivity       sensitive              -
big   vscan                 off                    default
big   nbmand                off                    default
big   sharesmb              off                    default
big   refquota              none                   default
big   refreservation        none                   default
big   primarycache          all                    default
big   secondarycache        all                    default
big   usedbysnapshots       0                      -
big   usedbydataset         4.80G                  -
big   usedbychildren        6.70M                  -
big   usedbyrefreservation  0                      -
big   logbias               latency                default
big   dedup                 off                    default
big   mlslabel                                     -
big   sync                  standard               default
big   refcompressratio      1.00x                  -

# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
big                        4.80G  14.7G  4.80G  /big
big@testcrashsnap4             0      -  4.80G  -
zroot                      5.64G   109G   894M  legacy
zroot/tmp                  2.14M   109G  2.14M  /tmp
zroot/usr                  4.72G   109G  2.45G  /usr
zroot/usr/home             53.5K   109G  53.5K  /usr/home
zroot/usr/obj               922M   109G   922M  /usr/objtmp
zroot/usr/ports            1.07G   109G   941M  /usr/ports
zroot/usr/ports/distfiles   150M   109G   150M  /usr/ports/distfiles
zroot/usr/ports/packages     21K   109G    21K  /usr/ports/packages
zroot/usr/src               314M   109G   314M  /usr/src
zroot/var                  17.6M   109G   904K  /var
zroot/var/crash            22.5K   109G  22.5K  /var/crash
zroot/var/db               16.2M   109G  15.1M  /var/db
zroot/var/db/pkg           1.10M   109G  1.10M  /var/db/pkg
zroot/var/empty              21K   109G    21K  /var/empty
zroot/var/log               272K   109G   272K  /var/log
zroot/var/mail               48K   109G    48K  /var/mail
zroot/var/run                50K   109G    50K  /var/run
zroot/var/tmp                23K   109G    23K  /var/tmp

# cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"

/etc/sysctl.conf is nothing but comments

On a virtual machine where I have 8.2 release (not stable), I don't know how to reproduce the problem.

I also tested it on the latest downloaded with cvsup today, which freezes the same way.

All my zfs systems are amd64.


I was hoping to use a zvol for iSCSI and use snapshots, so simply avoiding using snapshots on zvols is unacceptable.
>How-To-Repeat:
Prerequisite: 

A system running 8.2-STABLE (more specifically using *default date=2011.09.27.00.00.00 in cvsup).


(1) Create a zpool.

[root@bcnastest2 ~]# zpool status big
  pool: big
 state: ONLINE
 scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        big           ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            ad8       ONLINE       0     0     0
            ad10      ONLINE       0     0     0
            ad12      ONLINE       0     0     0
            ad16      ONLINE       0     0     0
        cache
          gpt/cache0  ONLINE       0     0     0

errors: No known data errors

(2) create a zvol in the above zpool. 

[root@bcnastest2 ~]# zfs create -V 100m big/testzvol

(3) run this script as root (written in bash, works in sh too except for the count printout; make sure to set dataset variable)

#-------begin script-------
dataset=big

count=0

while true; do
    echo Snapshot
    zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1
    zfs snapshot -r ${dataset}@testcrashsnap || break

    current=""
    for next in 1 2 3 4 5; do
        echo Renaming from ${current} to ${next}
        zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1
        zfs rename -r ${dataset}@testcrashsnap${current} ${dataset}@testcrashsnap${next} || break
        current=${next}
    done

    echo Destroy
    zfs destroy -r ${dataset}@testcrashsnap${current} || break
    let count++
    echo $count
done
#-------end script-------




Result: After an arbitrary number of loops, the output stops. Here is the output including result from hitting CTRL+C, CTRL+Z and Ctrl+T. The script was run on a Friday. The last line of output from Ctrl+t was done on the following Monday.

============================================
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
1
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
2
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
3
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
^C
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 5.56r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.07r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.26r 0.00u 0.00s 0% 1696k
load: 1.46  cmd: zfs 2363 [tx->tx_sync_done_cv)] 13.42r 0.00u 0.00s 0% 1696k
^C^C^C
load: 1.89  cmd: zfs 2363 [tx->tx_sync_done_cv)] 36.59r 0.00u 0.00s 0% 1696k



^C^D


load: 0.01  cmd: zfs 2363 [tx->tx_sync_done_cv)] 230096.99r 0.00u 0.00s 0% 1696k
============================================


>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-amd64->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Oct 25 13:15:29 UTC 2011 
Responsible-Changed-Why:  
reclassify and assign. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=161968 

From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Fri, 10 Feb 2012 12:11:41 +0100

 I tested this again using 8-STABLE (csup'd on 2012-01-04):
 
 FreeBSD bczfsvm1.bc.local 8.2-STABLE-20120104 FreeBSD
 8.2-STABLE-20120104 #0: Mon Feb  6 12:10:32 UTC 2012    
 root@bczfsvm1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
 
 on hardware:
 
 DELL PowerEdge 2850  - tested with a zfs stripe, raidz1, and raidz2
 and a SuperMicro dual xeon system - tested with a zfs mirror
 
 And it didn't hang.
 
 Now there are just brief pauses every 3-5 loops (instead of hangs?).
 
 So if someone tests this in 9.0-STABLE and finds that it doesn't hang,
 this PR should be closed.
 
 -- 
 
 --------------------------------------------
 Peter Maloney
 Brockmann Consult
 Max-Planck-Str. 2
 21502 Geesthacht
 Germany
 Tel: +49 4152 889 300
 Fax: +49 4152 889 333
 E-mail: peter.maloney@brockmann-consult.de
 Internet: http://www.brockmann-consult.de
 --------------------------------------------
 

From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Mon, 13 Feb 2012 09:56:54 +0100

 correction, the newly tested version was csup'd on 2012-02-04 (February,
 not Janurary)
 
 -- 
 
 --------------------------------------------
 Peter Maloney
 Brockmann Consult
 Max-Planck-Str. 2
 21502 Geesthacht
 Germany
 Tel: +49 4152 889 300
 Fax: +49 4152 889 333
 E-mail: peter.maloney@brockmann-consult.de
 Internet: http://www.brockmann-consult.de
 --------------------------------------------
 

From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Mon, 11 Jun 2012 10:23:58 +0200

 This is a multi-part message in MIME format.
 --------------060109020708020504050507
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 
 I've tested this in 8.3-RELEASE, and 8.3-STABLE pulled last week. *Both
 hang*, even though 8.2-STABLE in Feb 2012 did not hang.
 
 First terminal:
 
 Snapshot
 Renaming from to 1
 load: 0.00  cmd: zfs 57149 [tx->tx_sync_done_cv)] 104.03r 0.00u 0.00s 0%
 1920k
 
 
 Second:
 # zfs list
 load: 0.00  cmd: zfs 58403 [spa_namespace_lock] 119.06r 0.00u 0.00s 0% 1796k
 
 
 --------------060109020708020504050507
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 
 <html>
   <head>
 
     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
   </head>
   <body text="#000000" bgcolor="#FFFFFF">
     I've tested this in 8.3-RELEASE, and 8.3-STABLE pulled last week. <b>Both
       hang</b>, even though 8.2-STABLE in Feb 2012 did not hang.<br>
     <br>
     First terminal:<br>
     <br>
     Snapshot<br>
     Renaming from to 1<br>
     load: 0.00&nbsp; cmd: zfs 57149 [tx-&gt;tx_sync_done_cv)] 104.03r 0.00u
     0.00s 0% 1920k<br>
     <br>
     <br>
     Second:<br>
     # zfs list<br>
     load: 0.00&nbsp; cmd: zfs 58403 [spa_namespace_lock] 119.06r 0.00u 0.00s
     0% 1796k<br>
     <br>
   </body>
 </html>
 
 --------------060109020708020504050507--

From: Richard Yao <ryao@gentoo.org>
To: bug-followup@FreeBSD.org
Cc: peter.maloney@brockmann-consult.de
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Wed, 13 Jun 2012 14:17:33 -0400

 I tried to reproduce this issue after being contacted about it by Peter
 on freenode. I had to modify his script to destroy datasets
 individually, instead of recursively by running the following command:
 
 for i in $(zfs list -t snapshot -H -o name | grep
 testcrashsnap${current}); do zfs destroy $i; done;
 
 Otherwise, a "dataset is busy" failure occurs. This occurs on both
 FreeBSD and Linux.
 
 After doing that (and changing dataset=big to dataset=rpool), I was able
 to test his script in virtual machines running Gentoo FreeBSD 9-RELEASE
 and Gentoo Linux. I reproduced this issue on Gentoo FreeBSD 9.0-RELEASE.
 On the other hand, Gentoo Linux successfully completed 6570 iterations.
 This was with the ZFSOnLinux kernel modules. The code is available on
 github:
 
 https://github.com/zfsonlinux/zfs
 
 The actual code that I used to test was a patched version that I develop
 in a separate branch. You can find it here:
 
 https://github.com/ryao/zfs/tree/gentoo
 
 My current focus is on ZFS support in Gentoo Linux, but I would be happy
 to help my FreeBSD counterparts troubleshoot this. Please do not
 hesitate to contact me with questions.

From: Shane Ambler <FreeBSD@ShaneWare.Biz>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Thu, 19 Jul 2012 01:59:46 +0930

 I am running 9.0-RELEASE-p1 amd64 clang built world and get a hang with 
 zfs rename -r. I find anything already running mostly keeps running (top 
 started before the rename will hang) but any (uncached?) disk access 
 will cause running progs to hang as well. No new progs can start not 
 even a console login. I need to hard reset.
 
 Hardware is an ASUS P8H61-M LE/USB3 with corei5 and 8GB RAM using a 
 WD10EARS-00Y5B1 (WD green 1TB SATA2).
 Partitioned with 1 64k boot partition and 1 zfs partition. Single disk 
 zpool. The volume I have is allocated to swap.
 
 The commands I used for the volume are -
 zfs create -V 16G zrp/swap0
 zfs set org.freebsd:swap=on zrp/swap0
 zfs set copies=1 zrp/swap0
 
  From a clean pool with no snapshots -
 zfs snapshot -r zrp@daily.01 -- works
 zfs rename -r zrp@daily.01 zrp@daily.02 -- hangs
 
 Alternatively -
 zfs snapshot -r zrp@daily.01 -- works
 zfs rename -r zrp/swap0@daily.01 zrp/swap0@daily.02 -- works
 zfs rename -r zrp@daily.01 zrp@daily.02 -- works
 zfs rename -r zrp@daily.02 zrp@daily.03 -- hangs - now renames vol
 
 

From: Paavo Pokkinen <paavopok@ee.oulu.fi>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Mon, 03 Sep 2012 22:16:46 +0300

 I can confirm this exists on 9.1-RC1. Simply issuing "zfs snapshot -r 
 pakka@test" and after that "zfs rename -r pakka@test pakka@test2" hangs. 
 The filesystem appears to continue work, but all zfs commands start 
 hanging. Only reboot eventually helps.
 
 CTRL-T gives me:
 load: 0.00  cmd: zfs 1629 [tx->tx_sync_done_cv)] 412.62r 0.00u 0.00s 0% 
 2640k
 
 The hardware is pentium G630T with 8G ram on asus P8H77-I, four disks 
 are WD RED 3TB. The zpool is raidz1, and I'm using it on plain disks 
 without partitioning. I also have swap on zvol, apparently it also gets 
 snapshotted.
 
 --
 Paavo Pokkinen

From: Paavo Pokkinen <paavopok@ee.oulu.fi>
To: bug-followup@FreeBSD.org, peter.maloney@brockmann-consult.de
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including
 a zvol snapshot causes total ZFS freeze/lockup
Date: Mon, 03 Sep 2012 23:25:00 +0300

 I did some testing, and it appears at least in my case hanging is 
 related to presence of zvols. I removed my swap zvol, and renaming 
 snapshots appears to work fine. Then I created the zvol (did not 
 swapon), and hang appeared just like previously.
 
 --
 Paavo Pokkinen

From: "Steven Hartland" <smh@freebsd.org>
To: <bug-followup@freebsd.org>,
	"Peter Maloney" <peter.maloney@brockmann-consult.de>
Cc:  
Subject: Re: kern/161968: [zfs] [hang] renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup
Date: Wed, 19 Jun 2013 15:23:13 +0100

 I've reproduced this here, the cause is a live lock between zvols geom
 actions and ZFS itself between the two locks:
 db> show sleepchain
 thread 100553 (pid 6, txg_thread_enter) blocked on sx "spa_namespace_lock" XLOCK
 thread 100054 (pid 2, g_event) blocked on sx "dp->dp_config_rwlock" XLOCK
 
 db>     
 Tracing pid 2 tid 100054 td 0xffffff001c1d4470
 sched_switch() at sched_switch+0x153
 mi_switch() at mi_switch+0x1f8
 sleepq_switch() at sleepq_switch+0x123
 sleepq_wait() at sleepq_wait+0x4d
 _sx_slock_hard() at _sx_slock_hard+0x1e2
 _sx_slock() at _sx_slock+0xc9
 dsl_dir_open_spa() at dsl_dir_open_spa+0xab
 dsl_dataset_hold() at dsl_dataset_hold+0x3b
 dsl_dataset_own() at dsl_dataset_own+0x2f
 dmu_objset_own() at dmu_objset_own+0x36
 zvol_first_open() at zvol_first_open+0x34
 zvol_geom_access() at zvol_geom_access+0x2df
 g_access() at g_access+0x1ba
 g_part_taste() at g_part_taste+0xc4
 g_new_provider_event() at g_new_provider_event+0xaa
 g_run_events() at g_run_events+0x250
 fork_exit() at fork_exit+0x135
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff92070a2bb0, rbp = 0 ---
 db> bt 100553
 Tracing pid 6 tid 100553 td 0xffffff002c2308e0
 sched_switch() at sched_switch+0x153
 mi_switch() at mi_switch+0x1f8
 sleepq_switch() at sleepq_switch+0x123
 sleepq_wait() at sleepq_wait+0x4d
 _sx_xlock_hard() at _sx_xlock_hard+0x296
 _sx_xlock() at _sx_xlock+0xb7
 zvol_rename_minors() at zvol_rename_minors+0x75
 dsl_dataset_snapshot_rename_sync() at dsl_dataset_snapshot_rename_sync+0x141
 dsl_sync_task_group_sync() at dsl_sync_task_group_sync+0x14e
 dsl_pool_sync() at dsl_pool_sync+0x47d
 spa_sync() at spa_sync+0x34a
 txg_sync_thread() at txg_sync_thread+0x139
 fork_exit() at fork_exit+0x135
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff920e61abb0, rbp = 0 ---
 
 The following steps recreate the issue on stable/8 r251496
 
 gpart create -s GPT da3
 gpart add -t freebsd-zfs da3
 zpool create -f testpool da3p1
 zfs create -V 150m testpool/testvol
 zfs snapshot -r testpool@snap
 zfs rename -r testpool@snap testpool@snap-new
 
 I've been unable to reproduce on current r251471.
 
 I'm not sure is this is due to a timing issue due to the significant
 changes in ZFS sync tasks in current or if the issue really doesn't
 exist any more.
 
     Regards
     Steve
>Unformatted:
