From root@wallenda.spg.more.net  Mon Apr  2 16:44:53 2007
Return-Path: <root@wallenda.spg.more.net>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 522E716A401
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  2 Apr 2007 16:44:53 +0000 (UTC)
	(envelope-from root@wallenda.spg.more.net)
Received: from wallenda.spg.more.net (wallenda.spg.more.net [204.185.42.133])
	by mx1.freebsd.org (Postfix) with ESMTP id 3341013C44C
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  2 Apr 2007 16:44:53 +0000 (UTC)
	(envelope-from root@wallenda.spg.more.net)
Received: by wallenda.spg.more.net (Postfix, from userid 0)
	id 75A4D5C3C; Mon,  2 Apr 2007 11:13:33 -0500 (CDT)
Message-Id: <20070402161333.75A4D5C3C@wallenda.spg.more.net>
Date: Mon,  2 Apr 2007 11:13:33 -0500 (CDT)
From: dan@more.net
Reply-To: dan@more.net
To: FreeBSD-gnats-submit@freebsd.org
Cc: dan@more.net
Subject: fsck fails on 6T filesystem
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         111146
>Category:       bin
>Synopsis:       [2tb] fsck(8) fails on 6T filesystem
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-fs
>State:          suspended
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 02 16:50:00 GMT 2007
>Closed-Date:    
>Last-Modified:  Mon May 18 04:33:27 UTC 2009
>Originator:     Dan D Niles
>Release:        FreeBSD 6.2-RELEASE-p3 i386
>Organization:
MOREnet - Missouri Research and Education Network
>Environment:
System: FreeBSD hostname 6.2-RELEASE-p3 FreeBSD 6.2-RELEASE-p3 #5: Wed Mar 28 07:44:39 CDT 2007 root@hostname:/usr/obj/usr/src/sys/BIG_MEM i386

>Description:
     I have a 6T filesystem on a server that crashed.  I cannot fsck 
the filesystem.

# fsck -t ufs -y /dev/da0
fsck_ufs: cannot alloc 1993797728 bytes for inoinfo

I also tried:

# fsck -t ufs -f -p /dev/da0
/dev/da0: UNKNOWN FILE TYPE I=11895232
/dev/da0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

I built a custom kernel with MAXDSIZ and DFLDSIZ just under 3G, and got
the same results.  It was at about 430M in use when it crashed, so the
total would be 2332 M which is less that the size allowed (reported by
limits).

NOTE:  I have temporarily replaced the server.  For a short time I have the
crashed filesystem available for testing and debugging code.  I have
a core dump from the fsck.

>How-To-Repeat:
   On a 6T filesystem that has crashed, run:
	fsck -t ufs -y /dev/da0
>Fix:

>Release-Note:
>Audit-Trail:

From: Astrodog <astrodog@gmail.com>
To: bug-followup@FreeBSD.org, dan@more.net
Cc:  
Subject: Re: bin/111146: fsck fails on 6T filesystem
Date: Wed, 4 Apr 2007 08:13:57 -0500

 How much memory do you have in this system? There is a minimum ammount of
 memory required to fsck large filesystems, I've found.
 
 --- Harrison Grundy
 
From: Dan D Niles <dan@more.net>
To: Astrodog <astrodog@gmail.com>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6T filesystem
Date: Wed, 04 Apr 2007 09:29:29 -0500

 I only have 3G at the moment, but fsck is failing when the resulting
 memory usage would be 2.3G.  I have MAXDSIZ and DFLDSIZE set to 2.8G.
 I have 2G of swap space, none of which gets used.
 
 I'm getting a little pressure to reformat the array.  Is there any
 debugging you would like me to do?
 
 Thanks for your response,
 
 Dan D Niles
 
From: Jan Srzednicki <w@wrzask.pl>
To: bug-followup@FreeBSD.org, dan@more.net
Cc:  
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Sun, 8 Apr 2007 21:24:55 +0200

 Hi,
 
 First of all, show the output of both "ulimit -Sa" and "ulimit -Ha". It
 is possible that you may need to raise the soft limit manually.
 
 If the values are all right, try running fsck with strace/truss and show
 the result.
 
 -- 
   Jan Srzednicki  ::  http://wrzask.pl/
   "Remember, remember, the fifth of November"
                                      -- V for Vendetta
 

From: Dan D Niles <dan@more.net>
To: Jan Srzednicki <w@wrzask.pl>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 09 Apr 2007 11:12:46 -0500

 # ulimit -Sa
 core file size        (blocks, -c) unlimited
 data seg size         (kbytes, -d) 2935808
 file size             (blocks, -f) unlimited
 max locked memory     (kbytes, -l) unlimited
 max memory size       (kbytes, -m) unlimited
 open files                    (-n) 11095
 pipe size          (512 bytes, -p) 1
 stack size            (kbytes, -s) 65536
 cpu time             (seconds, -t) unlimited
 max user processes            (-u) 5547
 virtual memory        (kbytes, -v) unlimited
 
 # ulimit -Ha
 core file size        (blocks, -c) unlimited
 data seg size         (kbytes, -d) 2935808
 file size             (blocks, -f) unlimited
 max locked memory     (kbytes, -l) unlimited
 max memory size       (kbytes, -m) unlimited
 open files                    (-n) 11095
 pipe size          (512 bytes, -p) 1
 stack size            (kbytes, -s) 65536
 cpu time             (seconds, -t) unlimited
 max user processes            (-u) 5547
 virtual memory        (kbytes, -v) unlimited
 
 I've ordered a SCSI card to move the raid device to a server that I can
 bring up to 8G of ram.  I'm hoping the card gets here before I need to
 give the array back.
 
 I'll run fsck with truss and see with I find out.
 
 Thanks,
 
 Dan
 

From: Dan D Niles <dan@more.net>
To: Jan Srzednicki <w@wrzask.pl>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 09 Apr 2007 11:27:10 -0500

 On Sun, 2007-04-08 at 21:24 +0200, Jan Srzednicki wrote:
 > 
 > If the values are all right, try running fsck with strace/truss and show
 > the result.
 > 
 
 I added a debugging print statement to fsck_ffs, and sent it a SIGINFO
 every two seconds.   Here is the tail of the output, and the tail of the
 truss output.
 
 It seems like it is allocation space for < 10k inodes at a time until it
 fails.  When it fails it is trying to allocate space for 1.5g inodes.
 Is that normal?
 
 /dev/da0: phase 1: cyl group 2223 of 33666 (6%)
 Trying to calloc space for 2240 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 6208 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 768 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 4032 inodes
 Trying to calloc space for 6208 inodes
 Trying to calloc space for 1664 inodes
 /dev/da0: phase 1: cyl group 2252 of 33666 (6%)
 Trying to calloc space for 3584 inodes
 /dev/da0: phase 1: cyl group 2253 of 33666 (6%)
 Trying to calloc space for 448 inodes
 Trying to calloc space for 3648 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 4352 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 5376 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 384 inodes
 Trying to calloc space for 448 inodes
 Trying to calloc space for 1572191256 inodes
 fsck_ffs: cannot alloc 1993797728 bytes for inoinfo
 
 
 919: break(0x22ab2000)                         = 0 (0x0)
  1919: break(0x22ab3000)                         = 0 (0x0)
  1919: lseek(4,0x6570640000,SEEK_SET)            = 1885601792
 (0x70640000)
  1919: read(4,"\M-mA\^D\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536
 (0x10000)
  1919: lseek(4,0x657bdf0000,SEEK_SET)            = 2078212096
 (0x7bdf0000)
  1919: read(4,"\0\0\0\0U\^B\t\0004\^[\^EF\M-V\b"...,16384) = 16384
 (0x4000)
  1919: write(1,"Trying to calloc space for 448 i"...,38) = 38 (0x26)
  1919: lseek(4,0x657bdf4000,SEEK_SET)            = 2078228480
 (0x7bdf4000)
  1919: read(4,"\M-mA\^B\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536
 (0x10000)
  1919: break(0x22ab4000)                         = 0 (0x0)
  1919: lseek(4,0x657be04000,SEEK_SET)            = 2078294016
 (0x7be04000)
  1919: read(4,"\0\0\0\0000\0\0\0000\0\0\0\0\0\0"...,65536) = 65536
 (0x10000)
  1919: lseek(4,0x65875b4000,SEEK_SET)            = -2024062976
 (0x875b4000)
  1919: read(4,"\0\0\M-'\M-K,\M^H\M-:\M-Q*\^C\0"...,16384) = 16384
 (0x4000)
  1919: write(1,"Trying to calloc space for 15721"...,45) = 45 (0x2d)
  1919: write(2,"fsck_ffs: ",10)                  = 10 (0xa)
  1919: write(2,"cannot alloc 1993797728 bytes fo"...,41) = 41 (0x29)
  1919: write(2,"\n",1)                           = 1 (0x1)
  1919: exit(0x8)
  1919: process exit, rval = 2048
 
 

From: Jan Srzednicki <w@wrzask.pl>
To: Dan D Niles <dan@more.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 9 Apr 2007 21:48:52 +0200

 > It seems like it is allocation space for < 10k inodes at a time until it
 > fails.  When it fails it is trying to allocate space for 1.5g inodes.
 > Is that normal?
 
 Check with dumpfs how many inodes are there in your filesystem.
 
 -- 
   Jan Srzednicki  ::  http://wrzask.pl/
   "Remember, remember, the fifth of November"
                                      -- V for Vendetta
 

From: Dan D Niles <dan@more.net>
To: Jan Srzednicki <w@wrzask.pl>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 09 Apr 2007 15:09:28 -0500

 On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote:
 > Check with dumpfs how many inodes are there in your filesystem.
 
 dumpfs seg-faulted and dumped core.  It spit out this info before core
 dumping:
 
 magic   19540119 (UFS2) time    Wed Mar 28 14:00:00 2007
 superblock location     65536   id      [ 43d90071 e579e310 ]
 ncg     33666   size    3167475584      blocks  3067823920
 bsize   16384   shift   14      mask    0xffffc000
 fsize   2048    shift   11      mask    0xfffff800
 frag    8       shift   3       fsbtodb 2
 minfree 8%      optim   time    symlinklen 120
 maxbsize 16384  maxbpg  2048    maxcontig 8     contigsumsize 8
 nbfree  159788467       ndir    2581658 nifree  784218256       nffree
 1488762
 bpg     11761   fpg     94088   ipg     23552
 nindir  2048    inopb   64      maxfilesize     140806241583103
 sbsize  2048    cgsize  16384   csaddr  3000    cssize  540672
 sblkno  40      cblkno  48      iblkno  56      dblkno  3000
 cgrotor 28218   fmod    0       ronly   0       clean   0
 avgfpdir 64     avgfilesize 16384
 flags   unclean 
 fsmnt   /LSO
 volname         swuid   0
 
 cs[].cs_(nbfree,ndir,nifree,nffree):
         (4606,234,23288,6) (3955,223,23288,24) (80,0,23223,753)
 (3,226,23298,8) 
         (16,87,23338,81) (3,227,23298,7) (2436,185,23340,19)
 (4330,891,21577,21)
  
         (3971,170,23288,6) (1967,186,23336,33) (1812,177,23342,48)
 (6639,199,233
 24,50) 
         (6084,236,23288,16) (5213,224,23300,16) (5211,232,23287,19)
 (6042,237,23
 288,8) 
         (5213,236,23288,11) (5213,237,23288,10) (6120,237,23288,59)
 (1363,226,23
 298,219) 
         (5193,235,23288,60) (4,227,23298,8) (3059,197,23298,30)
 (5218,199,23288,
 9) 
         (6137,363,22338,9) (5221,174,23288,9) (5213,200,23288,48)
 (4323,199,2328
 8,42) 
 [clipped]
 
 

From: Jan Srzednicki <w@wrzask.pl>
To: Dan D Niles <dan@more.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 9 Apr 2007 22:13:36 +0200

 On Mon, Apr 09, 2007 at 03:09:28PM -0500, Dan D Niles wrote:
 > On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote:
 > > Check with dumpfs how many inodes are there in your filesystem.
 > 
 > dumpfs seg-faulted and dumped core.  It spit out this info before core
 > dumping:
 
 That's kinda strange, dumpfs never did that to me. It appears to me that
 this filesystem has got quite severely corrupted. Did you try newfs on
 it?
 
 And another thing: try tuning up the -i, -f and -b parameters to newfs.
 I assume that on such a big filesystem average filesize will be much
 bigger than the "UNIX default" (10k), so you can safely set these to
 their maximums (and allocate inodes more scarcely).
 
 -- 
   Jan Srzednicki  ::  http://wrzask.pl/
   "Remember, remember, the fifth of November"
                                      -- V for Vendetta
 

From: Dan D Niles <dan@more.net>
To: Jan Srzednicki <w@wrzask.pl>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 09 Apr 2007 15:30:23 -0500

 On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote:
 > That's kinda strange, dumpfs never did that to me. It appears to me
 > that
 > this filesystem has got quite severely corrupted. Did you try newfs on
 > it?
 
 Not yet.  I'd like to figure out why I can't fsck it first.  Running
 newfs on your backup disk is not a viable solution.  There is data I
 cannot pull of the disk.  If my primary storage had crashed also, I'd be
 hosed.
 
 > And another thing: try tuning up the -i, -f and -b parameters to
 > newfs.
 > I assume that on such a big filesystem average filesize will be much
 > bigger than the "UNIX default" (10k), so you can safely set these to
 > their maximums (and allocate inodes more scarcely).
 
 Running df reports 8683374 inodes used and 784218256 free.  This could
 be wrong since the filesystem is dirty and mounted ro.
 
 FreeBSD's newfs scales things automatically, though perhaps not enough:
 
 tunefs: maximum blocks per file in a cylinder group: (-e)  2048
 tunefs: average file size: (-f)                            16384
 tunefs: average number of files in a directory: (-s)       64
 tunefs: minimum percentage of free space: (-m)             8%
 tunefs: optimization preference: (-o)                      time
 
 

From: Jan Srzednicki <w@wrzask.pl>
To: Dan D Niles <dan@more.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 9 Apr 2007 22:39:52 +0200

 On Mon, Apr 09, 2007 at 03:30:23PM -0500, Dan D Niles wrote:
 > On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote:
 > > That's kinda strange, dumpfs never did that to me. It appears to me
 > > that
 > > this filesystem has got quite severely corrupted. Did you try newfs on
 > > it?
 > 
 > Not yet.  I'd like to figure out why I can't fsck it first.  Running
 > newfs on your backup disk is not a viable solution.  There is data I
 > cannot pull of the disk.  If my primary storage had crashed also, I'd be
 > hosed.
 
 Well, you need to take into the account that your data may be hosed.
 Backup your primary storage NOW. :)
 
 > > And another thing: try tuning up the -i, -f and -b parameters to
 > > newfs.
 > > I assume that on such a big filesystem average filesize will be much
 > > bigger than the "UNIX default" (10k), so you can safely set these to
 > > their maximums (and allocate inodes more scarcely).
 > 
 > Running df reports 8683374 inodes used and 784218256 free.  This could
 > be wrong since the filesystem is dirty and mounted ro.
 > 
 > FreeBSD's newfs scales things automatically, though perhaps not enough:
 
 It does not scale anything. Last time I checked (a few years ago) even
 the -g option did not make any difference either, so I had to tune
 things up manually with -i, -f and -b.
 
 > tunefs: maximum blocks per file in a cylinder group: (-e)  2048
 > tunefs: average file size: (-f)                            16384
 > tunefs: average number of files in a directory: (-s)       64
 > tunefs: minimum percentage of free space: (-m)             8%
 > tunefs: optimization preference: (-o)                      time
 
 These are the default values for any filesystem, regardles of it's size.
 
 -- 
   Jan Srzednicki  ::  http://wrzask.pl/
   "Remember, remember, the fifth of November"
                                      -- V for Vendetta
 

From: Dan D Niles <dan@more.net>
To: bug-followup@FreeBSD.org
Cc: Harrison Grundy <astrodog@gmail.com>, Jan Srzednicki <w@wrzask.pl>
Subject: Re: bin/111146: fsck fails on 6Tfilesystem
Date: Mon, 16 Apr 2007 14:08:57 -0500

 I attached the failed raid device to a newer server with 8G of RAM.  I
 booted to an amd64 kernel, and set datasize limit to 7G. 
 
 Resource limits (current):
   cputime          infinity secs
   filesize         infinity kB
   datasize          7340032 kB
   stacksize-cur        8192 kB
   coredumpsize     infinity kB
   memoryuse-cur     8093236 kB
   memorylocked-cur  1299644 kB
   maxprocesses         6164
   openfiles           12328
   sbsize           infinity bytes
   vmemoryuse       infinity kB
 
 
 Now when I run fsck I get:
 
 ** /dev/da0
 ** Last Mounted on /LSO
 ** Phase 1 - Check Blocks and Sizes
 fsck_ffs: bad inode number 53321728 to nextinode
 
 My theory is that some bits got flipped in the meta-data and
 cg_initediblk is getting a bad value.  The value of 1,572,191,256 that
 it returns just before it fails is greater than the total number of
 inodes, which is around 784,218,256.
 
 It is distressing that some bits in the meta-data could get flipped
 during normal usage resulting in an unusable filesystem.
 
 I have 19 hours before I need to reformat the array and put it back into
 production.  Is there anything else I should try before then?
 
 Thanks,
 
 Dan
 
 
 
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Wed Apr 25 22:28:39 UTC 2007 
State-Changed-Why:  
To submitter: did the fsck fix this problem? 


Responsible-Changed-From-To: freebsd-bugs->linimon 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Apr 25 22:28:39 UTC 2007 
Responsible-Changed-Why:  

http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 
State-Changed-From-To: feedback->suspended 
State-Changed-By: linimon 
State-Changed-When: Thu Apr 26 23:02:37 UTC 2007 
State-Changed-Why:  
Submitter had to format the drive, so we can't duplicate this right now. 
Set this to 'suspended' to note that it is a problem that probably still 
needs investigating. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 
Responsible-Changed-From-To: linimon->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Jun 12 03:40:30 UTC 2007 
Responsible-Changed-Why:  
Return this one to the pool. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon May 18 04:33:11 UTC 2009 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=111146 
>Unformatted:
