From hsu@news.bbnetworks.net  Fri Aug 21 18:07:12 2009
Return-Path: <hsu@news.bbnetworks.net>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CC9FF106568E
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 21 Aug 2009 18:07:12 +0000 (UTC)
	(envelope-from hsu@news.bbnetworks.net)
Received: from news.bbnetworks.net (news.bbnetworks.net [212.16.96.3])
	by mx1.freebsd.org (Postfix) with ESMTP id 6379E8FC16
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 21 Aug 2009 18:07:12 +0000 (UTC)
Received: from news.bbnetworks.net (localhost [127.0.0.1])
	by news.bbnetworks.net (8.14.3/8.14.3) with ESMTP id n7LEbAc3001168
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 21 Aug 2009 17:37:11 +0300 (EEST)
	(envelope-from hsu@news.bbnetworks.net)
Received: (from root@localhost)
	by news.bbnetworks.net (8.14.3/8.14.3/Submit) id n7L3fVOi001058;
	Fri, 21 Aug 2009 06:41:31 +0300 (EEST)
	(envelope-from hsu)
Message-Id: <200908210341.n7L3fVOi001058@news.bbnetworks.net>
Date: Fri, 21 Aug 2009 06:41:31 +0300 (EEST)
From: Heikki Suonsivu <hsu@bbnetworks.net>
Reply-To: Heikki Suonsivu <hsu@bbnetworks.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: fsck_ffs broken, partial patch
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         138043
>Category:       bin
>Synopsis:       [patch] fsck_ffs(8) broken, partial patch
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-fs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 21 18:10:01 UTC 2009
>Closed-Date:    Thu Jan 07 01:13:33 UTC 2010
>Last-Modified:  Thu Jan  7 02:00:14 UTC 2010
>Originator:     Heikki Suonsivu
>Release:        FreeBSD 7.2-STABLE amd64
>Organization:
bbnetworks.net
>Environment:
System: FreeBSD news.bbnetworks.net 7.2-STABLE FreeBSD 7.2-STABLE #0: Thu Aug 13 22:42:05 EEST 2009 hsu@news.bbnetworks.net:/usr/obj/usr/src/sys/BBNETWORKS7NEWS amd64

	Possibly all versions of FreeBSD with UFS2.

>Description:

	fsck_ffs trusts value of used inodes in cylinder group header
	on UFS2 filesystems.  Unfortunately, a disk/memory corruption,
	for whatever reason, may corrupt that particular value.  If
	the corrupt value is too large, this is easy to detect, by
	comparing it to superblock max value.  If it is too low, bad
	things may still happen, as not all inodes are checked,
	possibly causing loss of files?  The patch below works around
	the too large case.  I think that the whole optimization of
	trusting cylinder group header is too optimistic, and the
	fsck_ffs should probably be returned to UFS1 way, even if
	there would be performance penalty.

>How-To-Repeat:

	Have your 3ware 9500S RAID controller go nuts when hotswapping
	a disk in the pack, and with apparent failure of the BBU which
	may or may not be related.  Alternatively, use any other flaky
	hardware and wait.  The server this happened on has ECC memory
	so this leaves the controller or the disks as most likely
	source.  This is probably not a very frequent event, if I am the
	first person ever to stumble upon this.

	The problem shows up with error "bad inode number xxx to
	nextinode", as getnextinode gets called with inumber beyond
	that particular cylinder group.  fsck_ffs exits in this case,
	making the filesystem inaccessible.  Forcibly mounting the
	said filesystem read-only generated immediate panic.  There
	was lots of corruption, most of it seemed to concentrate in a
	area around this cylinder group, with little damage elsewhere.

>Fix:

This is partial fix as it only fixes the detectable situation when
cgrp.cg_initediblk is larger than number of inodes per cylinder group.
Possibly better alternative is to return to UFS1 code in this case.

Should this mode be triggered on any strange things, such as redo the
pass1 in case number of inodes used mismatches in some way?  It would
catch too small value case?

Anyway, fsck should never ever exit nor it should make optimistic
assumptions of the disk state.  I did not analyze softupdates cases,
its already 3:45am...

The below fixes a mildly confusing error message as well.

Index: main.c
===================================================================
RCS file: /usr/CVS/src/sbin/fsck_ffs/main.c,v
retrieving revision 1.47.2.6
diff -u -r1.47.2.6 main.c
--- main.c	27 Apr 2009 19:15:14 -0000	1.47.2.6
+++ main.c	20 Aug 2009 19:38:41 -0000
@@ -412,7 +412,10 @@
 	 */
 	if (duplist) {
 		if (preen || usedsoftdep)
-			pfatal("INTERNAL ERROR: dups with -p");
+		  	pfatal("INTERNAL ERROR: dups with %s%s%s", 
+			       preen ? "-p" : "", 
+			       (preen && usedsoftdep) ? " and " : "",
+			       usedsoftdep ? "softupdates" : "");
 		printf("** Phase 1b - Rescan For More DUPS\n");
 		pass1b();
 	}
Index: pass1.c
===================================================================
RCS file: /usr/CVS/src/sbin/fsck_ffs/pass1.c,v
retrieving revision 1.43
diff -u -r1.43 pass1.c
--- pass1.c	8 Oct 2004 20:44:47 -0000	1.43
+++ pass1.c	21 Aug 2009 02:40:57 -0000
@@ -93,10 +93,20 @@
 		inumber = c * sblock.fs_ipg;
 		setinodebuf(inumber);
 		getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize);
-		if (sblock.fs_magic == FS_UFS2_MAGIC)
+		if (sblock.fs_magic == FS_UFS2_MAGIC) {
 			inosused = cgrp.cg_initediblk;
-		else
+			if (inosused > sblock.fs_ipg) {
+			  /* If cgrp.cg_initediblk is impossible, ignore it.
+			   * This may indicate a bigger problem? */
+			  pwarn("Garbled number of initialized inodes (%d > %d) in cylinder group %d\n", 
+				inosused, sblock.fs_ipg, c);
+			  /* Set the value to maximum per cylinder group,
+			   * like UFS1. */
+			  inosused = sblock.fs_ipg;
+			}
+		} else {
 			inosused = sblock.fs_ipg;
+		}
 		if (got_siginfo) {
 			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
 			    cdevname, c, sblock.fs_ncg,



>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Sun Jan 3 12:34:54 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138043 
State-Changed-From-To: open->closed 
State-Changed-By: mckusick 
State-Changed-When: Thu Jan 7 01:11:31 UTC 2010 
State-Changed-Why:  
The suggested fix was added in r176575 by delphij on 2008-02-25. 
The more detailed error messages were added by me in r201708. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138043 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/138043: commit references a PR
Date: Thu,  7 Jan 2010 01:11:08 +0000 (UTC)

 Author: mckusick
 Date: Thu Jan  7 01:10:49 2010
 New Revision: 201708
 URL: http://svn.freebsd.org/changeset/base/201708
 
 Log:
   Add some error messages suggested in PR bin/138043. The code to
   correct the problem was added in r176575 by delphij on 2008-02-25.
   
   PR:		138043
   Reported by:	Heikki Suonsivu
 
 Modified:
   head/sbin/fsck_ffs/main.c
   head/sbin/fsck_ffs/pass1.c
 
 Modified: head/sbin/fsck_ffs/main.c
 ==============================================================================
 --- head/sbin/fsck_ffs/main.c	Thu Jan  7 00:57:40 2010	(r201707)
 +++ head/sbin/fsck_ffs/main.c	Thu Jan  7 01:10:49 2010	(r201708)
 @@ -406,7 +406,10 @@ checkfilesys(char *filesys)
  	 */
  	if (duplist) {
  		if (preen || usedsoftdep)
 -			pfatal("INTERNAL ERROR: dups with -p");
 +			pfatal("INTERNAL ERROR: dups with %s%s%s",
 +			    preen ? "-p" : "",
 +			    (preen && usedsoftdep) ? " and " : "",
 +			    usedsoftdep ? "softupdates" : "");
  		printf("** Phase 1b - Rescan For More DUPS\n");
  		pass1b();
  	}
 
 Modified: head/sbin/fsck_ffs/pass1.c
 ==============================================================================
 --- head/sbin/fsck_ffs/pass1.c	Thu Jan  7 00:57:40 2010	(r201707)
 +++ head/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:10:49 2010	(r201708)
 @@ -98,10 +98,16 @@ pass1(void)
  			rebuildcg = 1;
  		if (!rebuildcg && sblock.fs_magic == FS_UFS2_MAGIC) {
  			inosused = cgrp.cg_initediblk;
 -			if (inosused > sblock.fs_ipg)
 +			if (inosused > sblock.fs_ipg) {
 +				pfatal("%s (%d > %d) %s %d\nReset to %d\n",
 +				    "Too many initialized inodes", inosused,
 +				    sblock.fs_ipg, "in cylinder group", c,
 +				    sblock.fs_ipg);
  				inosused = sblock.fs_ipg;
 -		} else
 +			}
 +		} else {
  			inosused = sblock.fs_ipg;
 +		}
  		if (got_siginfo) {
  			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
  			    cdevname, c, sblock.fs_ncg,
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/138043: commit references a PR
Date: Thu,  7 Jan 2010 01:55:43 +0000 (UTC)

 Author: delphij
 Date: Thu Jan  7 01:55:34 2010
 New Revision: 201711
 URL: http://svn.freebsd.org/changeset/base/201711
 
 Log:
   MFC r176575:
   
   In pass1(), cap inosused to fs_ipg rather than allowing arbitrary
   number read from cylinder group.  Chances that we read a smarshed
   cylinder group, and we can not 100% trust information it has
   supplied.  fsck_ffs(8) will crash otherwise for some cases.
   
   PR:		bin/138043
   Reminded by:	mckusick
 
 Modified:
   stable/7/sbin/fsck_ffs/pass1.c
 Directory Properties:
   stable/7/sbin/fsck_ffs/   (props changed)
 
 Modified: stable/7/sbin/fsck_ffs/pass1.c
 ==============================================================================
 --- stable/7/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:24:09 2010	(r201710)
 +++ stable/7/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:55:34 2010	(r201711)
 @@ -93,9 +93,11 @@ pass1(void)
  		inumber = c * sblock.fs_ipg;
  		setinodebuf(inumber);
  		getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize);
 -		if (sblock.fs_magic == FS_UFS2_MAGIC)
 +		if (sblock.fs_magic == FS_UFS2_MAGIC) {
  			inosused = cgrp.cg_initediblk;
 -		else
 +			if (inosused > sblock.fs_ipg)
 +				inosused = sblock.fs_ipg;
 +		} else
  			inosused = sblock.fs_ipg;
  		if (got_siginfo) {
  			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/138043: commit references a PR
Date: Thu,  7 Jan 2010 01:56:48 +0000 (UTC)

 Author: delphij
 Date: Thu Jan  7 01:56:35 2010
 New Revision: 201712
 URL: http://svn.freebsd.org/changeset/base/201712
 
 Log:
   MFC r176575:
   
   In pass1(), cap inosused to fs_ipg rather than allowing arbitrary
   number read from cylinder group.  Chances that we read a smarshed
   cylinder group, and we can not 100% trust information it has
   supplied.  fsck_ffs(8) will crash otherwise for some cases.
   
   PR:		bin/138043
   Reminded by:	mckusick
 
 Modified:
   stable/6/sbin/fsck_ffs/pass1.c
 Directory Properties:
   stable/6/sbin/fsck_ffs/   (props changed)
 
 Modified: stable/6/sbin/fsck_ffs/pass1.c
 ==============================================================================
 --- stable/6/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:55:34 2010	(r201711)
 +++ stable/6/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:56:35 2010	(r201712)
 @@ -93,9 +93,11 @@ pass1(void)
  		inumber = c * sblock.fs_ipg;
  		setinodebuf(inumber);
  		getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize);
 -		if (sblock.fs_magic == FS_UFS2_MAGIC)
 +		if (sblock.fs_magic == FS_UFS2_MAGIC) {
  			inosused = cgrp.cg_initediblk;
 -		else
 +			if (inosused > sblock.fs_ipg)
 +				inosused = sblock.fs_ipg;
 +		} else
  			inosused = sblock.fs_ipg;
  		if (got_siginfo) {
  			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/138043: commit references a PR
Date: Thu,  7 Jan 2010 01:57:22 +0000 (UTC)

 Author: delphij
 Date: Thu Jan  7 01:57:13 2010
 New Revision: 201713
 URL: http://svn.freebsd.org/changeset/base/201713
 
 Log:
   MFC r176575:
   
   In pass1(), cap inosused to fs_ipg rather than allowing arbitrary
   number read from cylinder group.  Chances that we read a smarshed
   cylinder group, and we can not 100% trust information it has
   supplied.  fsck_ffs(8) will crash otherwise for some cases.
   
   PR:		bin/138043
   Reminded by:	mckusick
 
 Modified:
   stable/5/sbin/fsck_ffs/pass1.c
 Directory Properties:
   stable/5/sbin/fsck_ffs/   (props changed)
 
 Modified: stable/5/sbin/fsck_ffs/pass1.c
 ==============================================================================
 --- stable/5/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:56:35 2010	(r201712)
 +++ stable/5/sbin/fsck_ffs/pass1.c	Thu Jan  7 01:57:13 2010	(r201713)
 @@ -93,9 +93,11 @@ pass1(void)
  		inumber = c * sblock.fs_ipg;
  		setinodebuf(inumber);
  		getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize);
 -		if (sblock.fs_magic == FS_UFS2_MAGIC)
 +		if (sblock.fs_magic == FS_UFS2_MAGIC) {
  			inosused = cgrp.cg_initediblk;
 -		else
 +			if (inosused > sblock.fs_ipg)
 +				inosused = sblock.fs_ipg;
 +		} else
  			inosused = sblock.fs_ipg;
  		if (got_siginfo) {
  			printf("%s: phase 1: cyl group %d of %d (%d%%)\n",
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
