From nobody@FreeBSD.org  Tue May  4 18:26:27 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2F06F16A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Tue,  4 May 2004 18:26:27 -0700 (PDT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 047D243D2D
	for <freebsd-gnats-submit@FreeBSD.org>; Tue,  4 May 2004 18:26:27 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i451QQlX076620
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 4 May 2004 18:26:26 -0700 (PDT)
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id i451QQg4076619;
	Tue, 4 May 2004 18:26:26 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200405050126.i451QQg4076619@www.freebsd.org>
Date: Tue, 4 May 2004 18:26:26 -0700 (PDT)
From: Dane Foster <dene@slush.ca>
To: freebsd-gnats-submit@FreeBSD.org
Subject: dump causes machine freeze
X-Send-Pr-Version: www-2.3

>Number:         66270
>Category:       kern
>Synopsis:       [hang] dump(8) causes machine freeze
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          suspended
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue May 04 18:30:20 PDT 2004
>Closed-Date:    
>Last-Modified:  Mon Dec  5 00:00:21 UTC 2011
>Originator:     Dane Foster
>Release:        5.2.1
>Organization:
>Environment:
FreeBSD homer.connect.com.fj 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #0: Tue May  4 22:38:11 FJT 2004     root@homer.connect.com.fj:/usr/src/sys/i386/compile/IBM  i386

>Description:
      dump (called via amanda) with snapshots on a large, volatile filesystem
causes the machine to freeze. no errors reported
>How-To-Repeat:
      call dump via amanda (and likely on its own) on the filesystem when
its experiencing any activity.
>Fix:
      
>Release-Note:
>Audit-Trail:

From: Kris Kennaway <kris@obsecurity.org>
To: Dane Foster <dene@slush.ca>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Tue, 04 May 2004 22:04:44 -0700

 This PR contains even less information that your non-PR email, but what 
 you're describing sounds like intended behaviour.  Specifically, 
 creating a snapshot causes all other filesystem access to pause until 
 the creation is complete, which can take a long time on large or active 
 filesystems.
 
 Can you please confirm that if you wait long enough the operation completes?
 
 Kris

From: dane foster <dene@slush.ca>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Wed, 5 May 2004 17:41:36 +1200

 Sorry, a bit more detail.
 
 First, Im using the snapshot feature of dump (dump L). Second, approx. 
 30 seconds after i start the dump, the entire machine freezes 
 completely. No network, no keyboard. it requires a hard reset. This 
 happens on two separate machines.
 
 Its a bit difficult to debug, as there's no error message before/after 
 the machine freezes. Also, this is a production machine (mail server, 
 and db host). Mucking about with it is sort of frowned upon.
 
 If there's any info i can provide, let me know and i'll see what i can 
 do, crashing either host more than once or twice a week is bad though 
 ;)
 
 /dane
 
 
 On May 5, 2004, at 5:04 PM, Kris Kennaway wrote:
 

From: dane foster <dene@slush.ca>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Wed, 5 May 2004 17:53:30 +1200

 --Apple-Mail-1-202511614
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=US-ASCII;
 	format=flowed
 
 One more thing before I forget, I've made two small modifications to 
 dump's main.c
 One mod to default to always snapshot, and one mod to hardcode the path 
 to mksnap_ffs, I'll attach the diff
 
 I appreciate you looking into this, and once again, any more info I can 
 provide, just let me know.
 
 --Apple-Mail-1-202511614
 Content-Transfer-Encoding: 7bit
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="main.diff"
 Content-Disposition: attachment;
 	filename=main.diff
 
 --- main.c      Sun Apr 25 13:30:06 2004
 +++ main.c.orig Sun Nov 16 20:01:58 2003
 @@ -75,7 +75,7 @@
  #include "pathnames.h"
  
  int    notify = 0;     /* notify operator flag */
 -int    snapdump = 1;   /* dumping live filesystem, so use snapshot */
 +int    snapdump = 0;   /* dumping live filesystem, so use snapshot */
  int    blockswritten = 0;      /* number of blocks written on current tape */
  int    tapeno = 0;     /* current tape number */
  int    density = 0;    /* density in bytes/0.1" " <- this is for hilit19 */
 @@ -322,7 +322,7 @@
                         snprintf(snapname, sizeof snapname,
                             "%s/.snap/dump_snapshot", mntpt);
                         snprintf(snapcmd, sizeof snapcmd,
 -                           "/sbin/mksnap_ffs %s %s", mntpt, snapname);
 +                           "mksnap_ffs %s %s", mntpt, snapname);
                         unlink(snapname);
                         if (system(snapcmd) != 0)
                                 errx(X_STARTUP, "Cannot create %s: %s\n",
 
 
 --Apple-Mail-1-202511614--
 

From: Kris Kennaway <kris@obsecurity.org>
To: dane foster <dene@slush.ca>
Cc: Kris Kennaway <kris@obsecurity.org>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Tue, 4 May 2004 22:55:30 -0700

 --2fHTh5uZTiUOsy+g
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Wed, May 05, 2004 at 05:41:36PM +1200, dane foster wrote:
 > Sorry, a bit more detail.
 >=20
 > First, Im using the snapshot feature of dump (dump L). Second, approx.=20
 > 30 seconds after i start the dump, the entire machine freezes=20
 > completely. No network, no keyboard. it requires a hard reset. This=20
 > happens on two separate machines.
 
 As I explained, snapshots cause non-snapshot disk access to stall.
 Are you certain that's not what you're seeing?
 
 Kris
 --2fHTh5uZTiUOsy+g
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.4 (FreeBSD)
 
 iD8DBQFAmIHSWry0BWjoQKURAto5AKDldec2BZdMjV4zxUamR1CVeptABgCeKGNb
 KkH8g65YkLlYILOqYVoXNIM=
 =WDCb
 -----END PGP SIGNATURE-----
 
 --2fHTh5uZTiUOsy+g--
Responsible-Changed-From-To: freebsd-bugs->mckusick 
Responsible-Changed-By: kris 
Responsible-Changed-When: Tue May 4 23:06:19 PDT 2004 
Responsible-Changed-Why:  
Assign to kirk, as this may be a snapshot-related deadlock. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66270 

From: dane foster <dene@slush.ca>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Wed, 5 May 2004 18:01:15 +1200

 Yes, sure.
 
 It was initially working fine when i was running backups at 1am or so, 
 I ran a test backup at 8:30am once (machine under more load) and it 
 froze, and never recovered. This has been the case since..
 
 /dane
 
 On May 5, 2004, at 5:55 PM, Kris Kennaway wrote:
 
 > On Wed, May 05, 2004 at 05:41:36PM +1200, dane foster wrote:
 >> Sorry, a bit more detail.
 >>
 >> First, Im using the snapshot feature of dump (dump L). Second, approx.
 >> 30 seconds after i start the dump, the entire machine freezes
 >> completely. No network, no keyboard. it requires a hard reset. This
 >> happens on two separate machines.
 >
 > As I explained, snapshots cause non-snapshot disk access to stall.
 > Are you certain that's not what you're seeing?
 >
 > Kris
 

From: Kris Kennaway <kris@obsecurity.org>
To: dane foster <dene@slush.ca>
Cc: Kris Kennaway <kris@obsecurity.org>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Tue, 4 May 2004 23:05:59 -0700

 On Wed, May 05, 2004 at 06:01:15PM +1200, dane foster wrote:
 > Yes, sure.
 > 
 > It was initially working fine when i was running backups at 1am or so, 
 > I ran a test backup at 8:30am once (machine under more load) and it 
 > froze, and never recovered. This has been the case since..
 
 It's possible you're seeing a deadlock in the snapshot code.  I'm
 turning the PR over to kirk, the author of the snapshot code.
 
 Kris

From: dane foster <dene@slush.ca>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Wed, 5 May 2004 18:07:31 +1200

 An addendum, sorry, getting late and I'm getting forgetful.
 
 non-snapshot disk access freezes during snapshot creation only, from 
 what i understand. At most this should take about 5 seconds. I can run 
 mksnap_ffs on the filesystem in question whenever I'd like without 
 problem, only when trying to dump that snapshot does this happen.
 
 /dane
 
 On May 5, 2004, at 5:55 PM, Kris Kennaway wrote:
 
 > On Wed, May 05, 2004 at 05:41:36PM +1200, dane foster wrote:
 >> Sorry, a bit more detail.
 >>
 >> First, Im using the snapshot feature of dump (dump L). Second, approx.
 >> 30 seconds after i start the dump, the entire machine freezes
 >> completely. No network, no keyboard. it requires a hard reset. This
 >> happens on two separate machines.
 >
 > As I explained, snapshots cause non-snapshot disk access to stall.
 > Are you certain that's not what you're seeing?
 >
 > Kris
 

From: dane foster <dene@slush.ca>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/66270: dump causes machine freeze
Date: Wed, 5 May 2004 18:09:31 +1200

 Well, once more, I apologize for my initial lack of information and 
 thank you for spending the time gathering more. If there's any more 
 info I should gather and have ready for Kirk please advise.
 
 /dane
 
 
 On May 5, 2004, at 6:05 PM, Kris Kennaway wrote:
 
 > On Wed, May 05, 2004 at 06:01:15PM +1200, dane foster wrote:
 >> Yes, sure.
 >>
 >> It was initially working fine when i was running backups at 1am or so,
 >> I ran a test backup at 8:30am once (machine under more load) and it
 >> froze, and never recovered. This has been the case since..
 >
 > It's possible you're seeing a deadlock in the snapshot code.  I'm
 > turning the PR over to kirk, the author of the snapshot code.
 >
 > Kris
 >
 

From: "Steve Watt" <steve@Watt.COM>
To: <freebsd-gnats-submit@FreeBSD.org>, <dene@slush.ca>
Cc:  
Subject: Re: kern/66270: [hang] dump causes machine freeze
Date: Mon, 13 Dec 2004 08:14:32 -0800

 I would like to chime in with a "me too" on this PR.  I've seen tight =
 lockups
 (can't get in to DDB) once a week, during my Monday morning backup.  The =
 other
 6 days of the week don't lock up; my level 1 happens Sunday morning, the =
 level
 3 is Monday morning.  It always locks up on the filesystem that has the =
 INN
 databases.
 
 I had *one* incidence about three weeks ago where I could get in to DDB =
 on the
 console, did a little poking around, and when I attempted to dump, it =
 locked
 up tight, so I couldn't get a traceback.  From vague memory, I remember =
 a
 number of processes waiting on "ufs", and the innd was involved in the
 deadlock.
 
 I'm running 5.3-STABLE, updated 19 November around 20:05 PST.
 --=20
 Steve Watt  KD6GGD  PP-ASEL-IA  ICBM:  121W 56' 57.8" / 37N 20' 14.9"
 Internet:  steve @ Watt.COM                   Whois: SW32
 Free time?  There's no such thing.  It just comes in varying prices...
 

From: "Eugene" <genie@geniechka.ru>
To: <bug-followup@FreeBSD.org>, <dene@slush.ca>, <mckusick@FreeBSD.org>
Cc:  
Subject: Re: kern/66270: [hang] dump causes machine freeze
Date: Thu, 23 Mar 2006 04:37:59 +0300

 Hi!
 
 I wonder what is the status of this PR?
 I seem to have the same problem with 5.3-RELEASE.
 
 Before this started, system worked nicely for more than a year and then it 
 started to hang during dump (seemingly just after mksnap_ffs) first 
 intermittently and now reproducibly (might increase in data size be a 
 factor? though disk is only about 18% full).
 Originally it hanged and console had complains about increasing 
 PMAP_SHPGPERPROC and maxproc.
 After some talk in freebsd-fs, I adjusted these parameters and now it just 
 hangs without complaining =((
 
 Any suggestions?
 
 Thank you in advance
 Eugene 
 

From: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
To: genie@geniechka.ru
Cc: bug-followup@FreeBSD.org, dene@slush.ca, mckusick@FreeBSD.org,
        truckman@FreeBSD.org
Subject: Re: kern/66270: [hang] dump causes machine freeze
Date: Tue, 28 Mar 2006 02:10:48 +0000 (UTC)

 > I wonder what is the status of this PR?
 > I seem to have the same problem with 5.3-RELEASE.
 
 There are several bugs in 5.3-RELEASE that can lead to a deadlock when creating
 a snapshot.
 
 The deadlocks are typically caused by starvation (runningbufspace write
 throttling blocking copyonwrite processing), lock order reversals not detected
 by WITNESS (variants of snaplk versus buffer locks, snaplk versus normal vnode
 locks and vn_start_write() versus vnode locks) or incomplete flushing of dirty
 data when suspending the file systems (unsafe vnode list traversal and missing
 detection of need to retry vnode flush loop in ffs_sync()).
 
 Fixes for most of the known bugs has been committed to -current and merged back
 to RELENG_6.  They have not been merged back to RELENG_5.
 
 There are still unfixed bugs when a file system containing a snapshot becomes
 full.
 
 > Before this started, system worked nicely for more than a year and then it 
 > started to hang during dump (seemingly just after mksnap_ffs) first 
 > intermittently and now reproducibly (might increase in data size be a 
 > factor? though disk is only about 18% full).
 
 File system activity and layout affects the probability for a deadlock.  If a
 directory constantly being updated uses the same inodeblock as the directory
 containing the snapshot file then unlinking the snapshot file is more likely to
 deadlock (cf. revision 1.275 of ufs_vnops.c).
 
 > Any suggestions?
 
 Start planning for upgrade to 6.1-RELEASE.
 
 - Tor Egge
Responsible-Changed-From-To: mckusick->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun May 18 22:57:56 UTC 2008 
Responsible-Changed-Why:  
Reassign from Kirk as per his request. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66270 
State-Changed-From-To: open->suspended 
State-Changed-By: vwe 
State-Changed-When: Fri Aug 20 09:51:31 UTC 2010 
State-Changed-Why:  
release 5.2.1 is now out of support for a while 
we're suspending this PR as we think it might not be the an issue with more recent releases 
we'll be happy to re-open this, if someone proves the same with a supported branch 
for that, we would love to see WITNESS output 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66270 

From: =?windows-1251?B?yu7t/Oru4iDF4uPl7ejp?= <kes-kes@yandex.ru>
To: bug-followup@FreeBSD.org, dene@slush.ca
Cc:  
Subject: Re: kern/66270: [hang] dump(8) causes machine freeze
Date: Mon, 5 Dec 2011 01:57:33 +0200

 Hi
 
 dump -0L -f - /usr | gzip -2 | ssh -c blowfish usr@host dd of=/usr/sharedzone/snap/dump-usr.gz
 
 freeze system about 1 hour or 2.
 usr size if about 1.6Tb
 
 all services that do not take access to /usr works fine: I can connect
 to mysql, firebird database, but I can not ssh to this host ( I think
 ssh read something from /usr )
 
 and if I logged in to this machine through ssh I can not do anything:
  I press a key and I see it after 10-15min (
 
 -- 
  ,
                            mailto:kes-kes@yandex.ru
 
>Unformatted:
