From kjm@arc.umn.edu Tue Nov 23 13:32:31 1999
Return-Path: <kjm@arc.umn.edu>
Received: from kosh.arc.umn.edu (kosh.arc.umn.edu [137.66.130.40])
	by hub.freebsd.org (Postfix) with ESMTP id BFEBB15420
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 23 Nov 1999 13:32:16 -0800 (PST)
	(envelope-from kjm@arc.umn.edu)
Received: from delenn.arc.umn.edu (delenn.arc.umn.edu [137.66.132.150])
	by kosh.arc.umn.edu (8.9.3/8.9.3) with ESMTP id PAA16242
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 23 Nov 1999 15:32:12 -0600 (CST)
	(envelope-from kjm@delenn.arc.umn.edu)
Received: (from root@localhost)
	by delenn.arc.umn.edu (8.9.3/8.9.3) id PAA42114;
	Tue, 23 Nov 1999 15:32:11 -0600 (CST)
	(envelope-from kjm)
Message-Id: <199911232132.PAA42114@delenn.arc.umn.edu>
Date: Tue, 23 Nov 1999 15:32:11 -0600 (CST)
From: "Kevin J. Meehan" <kjm@arc.umn.edu>
Reply-To: kjm@arc.umn.edu
To: FreeBSD-gnats-submit@freebsd.org
Subject: fsck can't fix "huge" zero length files
X-Send-Pr-Version: 3.2

>Number:         15065
>Category:       kern
>Synopsis:       fsck can't fix "huge" zero length files
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 23 13:40:00 PST 1999
>Closed-Date:    Wed Jan 31 07:19:54 PST 2001
>Last-Modified:  Wed Jan 31 07:30:31 PST 2001
>Originator:     Kevin J. Meehan
>Release:        FreeBSD 3.3-STABLE i386
>Organization:
Network Computing Services, Inc.
>Environment:

	We have FreeBSD PC with a RAID subsystem that has a pair of
	CMD TECH CRD-5640 scsi controllers meant to back each other
	up should one fail. Unfortunately they had the first version
	of the firmware which was prone to double faulting-a situation
	that would mean both would hang and a power cycle was the only
	thing that would unlock the two controllers.

>Description:

	After a particularly bad week with 3 "double faults", one of 
	the 5 filesystems on the RAID subsystem had some odd files 
	the system pushed to the lost+found directory:

	c--Sr-S--T 1 cheng  si        147, 0x004f00ac Dec 31  1969 #1261947
	lrwxr-S--- 1 demlow ncs   4638043271730144200 Dec 31  1969 #2484071@ -> 
	s--x--S--- 1 root   wheel 4631321269953217064 Dec 31  1969 #2484095=

	The size returned by dump was a large negative number and thus broke
	Amanda. An umount and fsck of the filesystem would not fix the above.


>How-To-Repeat:

	Difficult-get a hold of a bum controller and have it munge your filesystem.
	(Not recommended!)

>Fix:
	
	We finally needed to use fsdb to remove the offending inodes. Note that
	while the link and file show huge sizes, their block counts are 0:

	fsdb (inum: 2)> inode 2484071
	current inode: symlink
	I=2484071 MODE=122740 SIZE=4638043271730144200
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Oct 20 14:23:00 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=demlow GRP=ncs LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=56ef2b95
	
	fsdb (inum: 2484071)> inode 2484095
	current inode: socket
	I=2484095 MODE=142100 SIZE=4631321269953217064
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Apr  9 17:05:23 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=25cd617b
	
	fsdb (inum: 2484095)> inode 1261947
	current inode: character special (147,5177516)I=1261947
		MODE=27040 SIZE=4639318980096568328
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Sep  9 13:30:59 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=cheng GRP=si LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=1552126f
	
	# /sbin/fsck /home1
	** /dev/rda2s1e
	** Last Mounted on /mnt
	** Phase 1 - Check Blocks and Sizes
	** Phase 2 - Check Pathnames
	UNALLOCATED  I=2484071  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#2484071
	
	REMOVE? [yn] y
	
	UNALLOCATED  I=2484095  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#2484095
	
	REMOVE? [yn] y
	
	UNALLOCATED  I=1261947  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#1261947
	
	REMOVE? [yn] y
	
	** Phase 3 - Check Connectivity
	** Phase 4 - Check Reference Counts
	** Phase 5 - Check Cyl groups
	FREE BLK COUNT(S) WRONG IN SUPERBLK
	SALVAGE? [yn] y
	
	SUMMARY INFORMATION BAD
	SALVAGE? [yn] y
	
	BLK(S) MISSING IN BIT MAPS
	SALVAGE? [yn] y
	
	56770 files, 3812612 used, 6350567 free
		(6759 frags, 792976 blocks, 0.1% fragmentation)
	
	***** FILE SYSTEM MARKED CLEAN *****
	
	***** FILE SYSTEM WAS MODIFIED *****
	# /sbin/fsck /home1
	** /dev/rda2s1e
	** Last Mounted on /mnt
	** Phase 1 - Check Blocks and Sizes
	** Phase 2 - Check Pathnames
	** Phase 3 - Check Connectivity
	** Phase 4 - Check Reference Counts
	** Phase 5 - Check Cyl groups
	56770 files, 3812612 used, 6350567 free
	(6759 frags, 792976 blocks, 0.1% fragmentation)

	We finally upgraded the firmware on the controllers from version 1
	to 9 on Nov 3rd and have been trouble free up to this point-but it
	was usually about a month in between lock ups.

	We were torn as to whether or not this should be reported. For one
	thing we had bum hardware. That is not FreeBSD's fault. On the other
	hand, the sizes reported are obviously rediculously huge and the block
	counts are zero. In the end we decided to let you know and let you 
	decide whether fsck should fix something like this automatically, or
	flag it as something that needs to be manually removed, or not.


>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: "Kevin J. Meehan" <kjm@arc.umn.edu>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/15065: fsck can't fix "huge" zero length files
Date: Thu, 25 Nov 1999 00:07:11 +1100 (EST)

 On Tue, 23 Nov 1999, Kevin J. Meehan wrote:
 
 > >Description:
 > 
 > 	After a particularly bad week with 3 "double faults", one of 
 > 	the 5 filesystems on the RAID subsystem had some odd files 
 > 	the system pushed to the lost+found directory:
 > 
 > 	c--Sr-S--T 1 cheng  si        147, 0x004f00ac Dec 31  1969 #1261947
 > 	lrwxr-S--- 1 demlow ncs   4638043271730144200 Dec 31  1969 #2484071@ -> 
 > 	s--x--S--- 1 root   wheel 4631321269953217064 Dec 31  1969 #2484095=
 > 
 > 	The size returned by dump was a large negative number and thus broke
 > 	Amanda. An umount and fsck of the filesystem would not fix the above.
 
 Try this fix.  I wrote it to fixed corrupted holey files of size 17TB on ffs
 with a blocksize of 8KB while fixing ffs to support such files.  The fixes
 are incomplete and have not been committed.
 
 diff -c2 pass1.c~ pass1.c
 *** pass1.c~	Sun Aug 29 11:00:46 1999
 --- pass1.c	Sun Aug 29 11:00:57 1999
 ***************
 *** 174,178 ****
   	register struct dinode *dp;
   	struct zlncnt *zlnp;
 ! 	int ndb, j;
   	mode_t mode;
   	char *symbuf;
 --- 174,180 ----
   	register struct dinode *dp;
   	struct zlncnt *zlnp;
 ! 	u_int64_t bigndb;
 ! 	ufs_daddr_t j;
 ! 	u_long ndb;
   	mode_t mode;
   	char *symbuf;
 ***************
 *** 210,220 ****
   		inodirty();
   	}
 ! 	ndb = howmany(dp->di_size, sblock.fs_bsize);
 ! 	if (ndb < 0) {
   		if (debug)
 ! 			printf("bad size %qu ndb %d:",
 ! 				dp->di_size, ndb);
   		goto unknown;
   	}
   	if (mode == IFBLK || mode == IFCHR)
   		ndb++;
 --- 212,223 ----
   		inodirty();
   	}
 ! 	bigndb = howmany(dp->di_size, sblock.fs_bsize);
 ! 	if (bigndb != 0 && (ufs_daddr_t)(bigndb - 1) != bigndb - 1) {
   		if (debug)
 ! 			printf("bad size %qu bigndb %qu:",
 ! 			    dp->di_size, bigndb);
   		goto unknown;
   	}
 + 	ndb = (u_long)bigndb;
   	if (mode == IFBLK || mode == IFCHR)
   		ndb++;
 ***************
 *** 252,256 ****
   		}
   	}
 ! 	for (j = ndb; j < NDADDR; j++)
   		if (dp->di_db[j] != 0) {
   			if (debug)
 --- 255,260 ----
   		}
   	}
 ! 	if (ndb < NDADDR)
 ! 	    for (j = ndb; j < NDADDR; j++)
   		if (dp->di_db[j] != 0) {
   			if (debug)
 ***************
 *** 259,263 ****
   			goto unknown;
   		}
 ! 	for (j = 0, ndb -= NDADDR; ndb > 0; j++)
   		ndb /= NINDIR(&sblock);
   	for (; j < NIADDR; j++)
 --- 263,267 ----
   			goto unknown;
   		}
 ! 	for (j = 0, ndb -= NDADDR; (ufs_daddr_t)ndb > 0; j++)
   		ndb /= NINDIR(&sblock);
   	for (; j < NIADDR; j++)
 
 Bruce
 
 
State-Changed-From-To: open->closed 
State-Changed-By: iedowse 
State-Changed-When: Wed Jan 31 07:19:54 PST 2001 
State-Changed-Why:  
Fixed in revision 1.21 of src/sbin/fsck_ffs/pass1.c. I'll merge this 
into -stable in a few days. Thanks for the bug report! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=15065 
>Unformatted:
