From jgreco@aurora.sol.net  Sun Jun 21 21:32:00 2009
Return-Path: <jgreco@aurora.sol.net>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A1AD01065673
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 21 Jun 2009 21:32:00 +0000 (UTC)
	(envelope-from jgreco@aurora.sol.net)
Received: from mail2.sol.net (mail2.sol.net [206.55.64.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 65D168FC1F
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 21 Jun 2009 21:32:00 +0000 (UTC)
	(envelope-from jgreco@aurora.sol.net)
Received: from aurora.sol.net (aurora.sol.net [206.55.65.130])
	by mail2.sol.net (8.14.1/8.14.1/SNNS-1.04) with ESMTP id n5LKbHUv067810
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 21 Jun 2009 15:37:18 -0500 (CDT)
Received: (from jgreco@localhost)
	by aurora.sol.net (8.12.8p1/8.12.9/Submit) id n5LKbWmt038127;
	Sun, 21 Jun 2009 15:37:32 -0500 (CDT)
Message-Id: <200906212037.n5LKbWmt038127@aurora.sol.net>
Date: Sun, 21 Jun 2009 15:37:32 -0500 (CDT)
From: Joe Greco <jgreco@ns.sol.net>
Reply-To: Joe Greco <jgreco@ns.sol.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Severe filesystem corruption - large files or large filesystems
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         135898
>Category:       kern
>Synopsis:       [geom] Severe filesystem corruption - large files or large filesystems
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-geom
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jun 21 21:40:03 UTC 2009
>Closed-Date:    
>Last-Modified:  Tue Feb 22 15:52:16 EST 2011
>Originator:     Joe Greco
>Release:        FreeBSD 6-7, i386/amd64, etc (incompatibilities)
>Organization:
sol.net Network Services
>Environment:

Thie is a follow-on to a problem previously described on freebsd-hackers
which was dismissed as a "controller corruption."

We had noticed odd behaviour on a 1.5TB filesystem with a large file when
moved from i386 to amd64.  A ~400GB compressed dump refused to read when
attached to an 7.*R/amd64 box.  It was suggested that the dodgy SATA ctlr
used to write the data was at fault, a theory put forth on hackers despite
the fact that the data read fine on various amd64 and i386 boxes that had
a good SATA ctlr, and it wasn't clear how that could be possible.

Now that the filesystem restore is done, we acquired a second 1.5TB disk
and we've got some more experimental results.

Steps taken:

1) Place a Seagate 1.5TB SATA disk on a 3Ware 9550SX on a 6.1R/amd64 box.
   Create full disk filesystem.  Mount, etc.

   # dd if=/dev/random of=file bs=1048576 count=500000

   # cat genmds
   #! /bin/sh -

   uname -rp
   (
   echo "count    1 " `dd if=file bs=1048576 count=1 | md5`
   echo "count    4 " `dd if=file bs=1048576 count=4 | md5`
   echo "count   16 " `dd if=file bs=1048576 count=16 | md5`
   echo "count  256 " `dd if=file bs=1048576 count=256 | md5`
   echo "count 1024 " `dd if=file bs=1048576 count=1024 | md5`
   ) 2> /dev/null

   # sh genmds
   6.1-RELEASE amd64
   count    1  a78d2f52290367c76837fb00a16a4e79
   count    4  ba4e51e332a27ff8e5c817b4a95501d5
   count   16  af3cd4e9a10cb5679a50081cdb35d54f
   count  256  56cc987930c0c9e8246c8f91ed9f23bb
   count 1024  4bc8e00d9c20210bd5cc0ccfbd7bb1a3

   Gives us a baseline for comparison.

2) Unmount.  Move to 7.0R/amd64 box.

   # sh genmds
   7.0-RELEASE amd64
   count    1  88f830ae7f572282a2da19ffb3d036e4
   count    4  6b38b2c18f039859b10d8d33ffcc19c9
   count   16  2d40cb233ef4be44a3c33edc79c3aa05
   count  256  4fd629256316643099ddfdfd40afe56c
   count 1024  8ea0fa80158105d5b6f1768de5ceddc6

   More terrifyingly, when repeated, some answers *change*:

   # sh genmds
   7.0-RELEASE amd64
   count    1  d7c43b568d8f72ecbd47d2dc89062704
   count    4  e670a01847d4fe08958754ea434fbf6d
   count   16  c0c962024713c3db3e8d5070f7284413
   count  256  830e32f1b862c7b867ccaf05782ff769
   count 1024  8ea0fa80158105d5b6f1768de5ceddc6

   It's not even consistent.

3) Unmount.  Move to 7.2R/i386 box.

   # sh genmds
   7.2-RELEASE i386
   count    1  a78d2f52290367c76837fb00a16a4e79
   count    4  ba4e51e332a27ff8e5c817b4a95501d5
   count   16  af3cd4e9a10cb5679a50081cdb35d54f
   count  256  56cc987930c0c9e8246c8f91ed9f23bb
   count 1024  4bc8e00d9c20210bd5cc0ccfbd7bb1a3

   Lookin' good.

So.  Experiment further.  Place a Deskstar 250GB SATA on the 6.1R/amd64,
write file.

4) basically same as 1), abbreviated for clarity

   # cat 61r-amd64 ; sh genmds
   6.1-RELEASE amd64
   count    1  65827a57009f618b3638f557246f40d8
   count    4  b3f5e9743173c29211545ff42f6df15d
   count   16  1b6a9d862522bf1091a47f91b874470c
   count  256  64545fa4dc6af95519dfd6644639518f
   count 1024  3ed6942e979a794eaa3acdd2908543f4
   7.0-RELEASE amd64
   count    1  65827a57009f618b3638f557246f40d8
   count    4  b537d1a50c85ee2e49cddb74a0afa9d0
   count   16  5632bd5c03008dd89ea332f19aa8240c
   count  256  980bb1ba249fcff0136867bb8b461a3c
   count 1024  980bb1ba249fcff0136867bb8b461a3c

Well now that's interesting and puzzling.  Turns out that dd is failing
with

dd: file: Input/output error
144+0 records in
144+0 records out
150994944 bytes transferred in 1.816704 secs (83114773 bytes/sec)
count  256  980bb1ba249fcff0136867bb8b461a3c
dd: file: Input/output error
144+0 records in
144+0 records out
150994944 bytes transferred in 1.816477 secs (83125159 bytes/sec)
count 1024  980bb1ba249fcff0136867bb8b461a3c

and I'm seeing

g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5

And... I'm guessing based on what I see that the "I/O error" is simply a
ludicrous offset, but I could be wrong.

My best guess is that there is something amiss on FreeBSD 7.*/amd64 relating
to the filesystem code.

It appears to be repeatable.  Would anyone care to try?

... JG
>Description:
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-geom
Responsible-Changed-By: linimon
Responsible-Changed-When: Mon Jun 22 00:43:05 UTC 2009
Responsible-Changed-Why:
I'm going to take a guess and assign this to the geom mailing list.

http://www.freebsd.org/cgi/query-pr.cgi?pr=135898
State-Changed-From-To: open->feedback
State-Changed-By: eadler
State-Changed-When: Mon Feb 21 21:54:51 EST 2011
State-Changed-Why:
Can you reproduce this on a recent version of freebsd?

http://www.freebsd.org/cgi/query-pr.cgi?pr=135898
State-Changed-From-To: feedback->open 
State-Changed-By: eadler 
State-Changed-When: Tue Feb 22 15:45:22 EST 2011 
State-Changed-Why:  
Feedback received outside audit trail 

http://www.freebsd.org/cgi/query-pr.cgi?pr=135898 
>Unformatted:
