From sjr@comcast.net  Sun Jul 16 01:55:02 2006
Return-Path: <sjr@comcast.net>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A93E316A4DD
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 16 Jul 2006 01:55:02 +0000 (UTC)
	(envelope-from sjr@comcast.net)
Received: from alnrmhc11.comcast.net (alnrmhc13.comcast.net [206.18.177.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 34E5243D45
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 16 Jul 2006 01:55:02 +0000 (GMT)
	(envelope-from sjr@comcast.net)
Received: from istari.comcast.net (c-69-139-159-113.hsd1.md.comcast.net[69.139.159.113](misconfigured sender))
          by comcast.net (alnrmhc13) with ESMTP
          id <20060716015501b1300fpgjke>; Sun, 16 Jul 2006 01:55:01 +0000
Received: from istari.comcast.net (localhost [127.0.0.1])
	by istari.comcast.net (8.13.6/8.13.6) with ESMTP id k6G1t0VF076032
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Jul 2006 21:55:00 -0400 (EDT)
	(envelope-from sjr@istari.comcast.net)
Received: (from sjr@localhost)
	by istari.comcast.net (8.13.6/8.13.6/Submit) id k6G1t0dU076031;
	Sat, 15 Jul 2006 21:55:00 -0400 (EDT)
	(envelope-from sjr)
Message-Id: <200607160155.k6G1t0dU076031@istari.comcast.net>
Date: Sat, 15 Jul 2006 21:55:00 -0400 (EDT)
From: "Stephen J. Roznowski" <sjr@comcast.net>
Reply-To: "Stephen J. Roznowski" <sjr@comcast.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: snapshots on busy filesystem fail
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         100365
>Category:       kern
>Synopsis:       snapshots on busy filesystem fail
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kib
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jul 16 02:00:31 GMT 2006
>Closed-Date:    Tue Oct 03 07:42:18 GMT 2006
>Last-Modified:  Tue Oct 03 07:42:18 GMT 2006
>Originator:     Stephen J. Roznowski
>Release:        FreeBSD 6.1-STABLE amd64
>Organization:
>Environment:
System: FreeBSD 6.1-STABLE FreeBSD 6.1-STABLE #3: Sat Jul 8 22:34:39 EDT 2006


	
>Description:
	Taking snapshots while creating/deleting directories leads
	to corrupted snapshots.
>How-To-Repeat:

I'm running the following script as root:

	#!/bin/sh

	FS=/usr/ports

	for i in 1 2 3 4 5 6 7 8 9 10
	do
		echo $i
		mksnap_ffs $FS $FS/.snap/snapshot
		fsck_ffs $FS/.snap/snapshot
		/bin/rm -f $FS/.snap/snapshot
	done

If the filesystem is quiet, the script completes without errors.

In another window, I'm running the following:

	# mkdir /usr/ports/.a
	# cd /usr/ports/.a
	# while (1)
	> mkdir a b c d e f g h i j k l m n o p
	> rmdir a b c d e f g h i j k l m n o p
	> sleep 1
	> end

Now, when I run the previous script, occasionally I'll get snapshots
that are corrupt when the fsck is checking them. [Appears to be about
50% of the time....]

Additionally, I'm seeing the following error:

	mksnap_ffs: Cannot create /usr/ports/.snap/snapshot: Resource temporarily unavailable
	/usr/ports/.snap/snapshot is not a disk device

and dmesg shows:

  fsync: giving up on dirty
  0xffffff002fcee9b0: tag devfs, type VCHR
      usecount 1, writecount 0, refcount 635 mountedhere 0xffffff0000e7c600
      flags ()
      v_object 0xffffff002ff5a460 ref 0 pages 2556
       lock type devfs: EXCL (count 1) by thread 0xffffff002ac52260 (pid 71385)
          dev ad0s1f
  fsync: giving up on dirty
  0xffffff002fcee9b0: tag devfs, type VCHR
      usecount 1, writecount 0, refcount 634 mountedhere 0xffffff0000e7c600
      flags ()
      v_object 0xffffff002ff5a460 ref 0 pages 2552
       lock type devfs: EXCL (count 1) by thread 0xffffff002ac52260 (pid 71385)
          dev ad0s1f

I can provide more system configuration details if needed.

>Fix:


>Release-Note:
>Audit-Trail:

From: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
To: sjr@comcast.net
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/100365: snapshots on busy filesystem fail
Date: Mon, 21 Aug 2006 12:57:32 +0000 (UTC)

 It looks like snapshots don't work well on amd64.  Making a snapshot
 while 'make world' is running in the background and mounting it multiple
 times shows that it isn't stable:
 
 
 # rm -f /usr/.snap/snapshot
 # mksnap_ffs /usr /usr/.snap/snapshot
 # mdconfig -a -t vnode -f /usr/.snap/snapshot -u 0 -o readonly
 # mount -r /dev/md0 /mnt
 # ls -lisdtT /mnt/src/make.world
 1860701 11792 -rw-r--r--  1 root  bin  12050746 Aug 19 23:21:33 2006 /mnt/src/make.world
 # umount /mnt
 # mount -r /dev/md0 /mnt
 # ls -lisdtT /mnt/src/make.world 
 1860701 12528 -rw-r--r--  1 root  bin  12801326 Aug 19 23:23:03 2006 /mnt/src/make.world
 
 The check at the start of ffs_copyonwrite() for whether the write is to a
 snapshot file or not is faulty when the write is a metadata update.  In that
 case, the vnode associated with the buffer doesn't have an inode, but instead a
 devfs_dirent structure.
 
 Memory beyond the end of the related devfs_dirent structure is incorrectly
 interpreted as ufs inode flags for those metadata updates.
 
 On RELENG_6/amd64, what is interpreted as i_flags is really the start of the
 device name in the dirent structure following the devfs_dirent structure,
 content typically 0x73306164 (da0s), triggering the failure.
 
 On HEAD/i386, what is interpreted as i_flags is beyond the end of both the
 devfs_dirent structure and the following dirent structure, content typically
 zero, not triggering the failure.
 
 - Tor Egge
State-Changed-From-To: open->patched 
State-Changed-By: kib 
State-Changed-When: Mon Aug 21 17:06:17 UTC 2006 
State-Changed-Why:  
The supposed fix for the problem was commited as rev. 1.128 
of the file sys/ufs/ffs/ffs_snapshot.c. Please, test and report results. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100365 
Responsible-Changed-From-To: freebsd-bugs->kib 
Responsible-Changed-By: kib 
Responsible-Changed-When: Mon Aug 21 17:22:05 UTC 2006 
Responsible-Changed-Why:  
Grab this one. I already committed the fix. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100365 
State-Changed-From-To: patched->feedback 
State-Changed-By: kib 
State-Changed-When: Mon Sep 25 08:45:28 UTC 2006 
State-Changed-Why:  
The fix was committed to both CURRENT and STABLE. I got several reports from 
people that state that the problem goes away after the update. But the 
originator of the patch said his troubles are still there. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100365 

From: "Stephen J. Roznowski" <sjr@comcast.net>
To: kib@FreeBSD.org
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/100365: snapshots on busy filesystem fail
Date: Fri, 29 Sep 2006 07:51:12 -0400 (EDT)

 On 25 Sep, Konstantin Belousov wrote:
 > Synopsis: snapshots on busy filesystem fail
 > 
 > State-Changed-From-To: patched->feedback
 > State-Changed-By: kib
 > State-Changed-When: Mon Sep 25 08:45:28 UTC 2006
 > State-Changed-Why: 
 > The fix was committed to both CURRENT and STABLE. I got several reports from
 > people that state that the problem goes away after the update. But the
 > originator of the patch said his troubles are still there.
 > 
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=100365 bug-followup@FreeBSD.orgbug-followup@FreeBSD.org
 
 I rebuilt my entire system after this commit and the problem still is
 occuring [ffs_snapshot.c 1.103.2.17]. Running the test in the original
 submittion returns:
 
 ...
 mksnap_ffs: Cannot create /usr/ports/.snap/snapshot: Resource temporarily unavailable
 /usr/ports/.snap/snapshot is not a disk device
 
 and dmesg shows:
 
 fsync: giving up on dirty
 0xffffff00302711f0: tag devfs, type VCHR
     usecount 1, writecount 0, refcount 634 mountedhere 0xffffff0000e4aa00
     flags ()
     v_object 0xffffff003e169460 ref 0 pages 2588
      lock type devfs: EXCL (count 1) by thread 0xffffff002393c000 (pid 2305)
         dev ad0s1f
 fsync: giving up on dirty
 0xffffff00302711f0: tag devfs, type VCHR
     usecount 1, writecount 0, refcount 634 mountedhere 0xffffff0000e4aa00
     flags ()
     v_object 0xffffff003e169460 ref 0 pages 2588
      lock type devfs: EXCL (count 1) by thread 0xffffff002393c000 (pid 2305)
         dev ad0s1f
 
 
 Thanks,
 -SR
 -- 
 Stephen J. Roznowski    (sjr@comcast.net)

From: Kostik Belousov <kostikbel@gmail.com>
To: "Stephen J. Roznowski" <sjr@comcast.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/100365: snapshots on busy filesystem fail
Date: Mon, 2 Oct 2006 11:21:45 +0300

 --oLBj+sq0vYjzfsbl
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Sep 29, 2006 at 07:51:12AM -0400, Stephen J. Roznowski wrote:
 > I rebuilt my entire system after this commit and the problem still is
 > occuring [ffs_snapshot.c 1.103.2.17]. Running the test in the original
 > submittion returns:
 >=20
 > ...
 > mksnap_ffs: Cannot create /usr/ports/.snap/snapshot: Resource temporarily=
  unavailable
 > /usr/ports/.snap/snapshot is not a disk device
 >=20
 > and dmesg shows:
 >=20
 > fsync: giving up on dirty
 > 0xffffff00302711f0: tag devfs, type VCHR
 >     usecount 1, writecount 0, refcount 634 mountedhere 0xffffff0000e4aa00
 >     flags ()
 >     v_object 0xffffff003e169460 ref 0 pages 2588
 >      lock type devfs: EXCL (count 1) by thread 0xffffff002393c000 (pid 23=
 05)
 >         dev ad0s1f
 > fsync: giving up on dirty
 > 0xffffff00302711f0: tag devfs, type VCHR
 >     usecount 1, writecount 0, refcount 634 mountedhere 0xffffff0000e4aa00
 >     flags ()
 >     v_object 0xffffff003e169460 ref 0 pages 2588
 >      lock type devfs: EXCL (count 1) by thread 0xffffff002393c000 (pid 23=
 05)
 >         dev ad0s1f
 EAGAIN is valid return code for snapshotting. It means that the FS was
 really busy at the time you tried to snapshot it, snapshot was not made
 and you shall retry later.
 
 What I want is feedback about issue of corrupted snapshots and snapshot fil=
 es
 that change it size during fs activity.
 
 --oLBj+sq0vYjzfsbl
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.5 (FreeBSD)
 
 iD8DBQFFIMwZC3+MBN1Mb4gRAuQdAJ9O/gW4CAcI24SQsjDCt0McJVgqZwCfY1Vp
 LbQlJYLeYqlzyokmW+hdCW4=
 =FfG9
 -----END PGP SIGNATURE-----
 
 --oLBj+sq0vYjzfsbl--
State-Changed-From-To: feedback->closed 
State-Changed-By: kib 
State-Changed-When: Tue Oct 3 07:41:56 UTC 2006 
State-Changed-Why:  
The bug is fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100365 
>Unformatted:
