From nobody@FreeBSD.org  Sun May 22 03:25:57 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CFC08106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 May 2011 03:25:57 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id B6E548FC0A
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 May 2011 03:25:57 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p4M3PvGL074101
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 22 May 2011 03:25:57 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p4M3PvYD074100;
	Sun, 22 May 2011 03:25:57 GMT
	(envelope-from nobody)
Message-Id: <201105220325.p4M3PvYD074100@red.freebsd.org>
Date: Sun, 22 May 2011 03:25:57 GMT
From: Gene Stark <gene@home.starkeffect.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: dump/restore: unknown tape header type  -230747966
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         157244
>Category:       bin
>Synopsis:       dump(1): dump/restore: unknown tape header type  -230747966
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 22 03:30:10 UTC 2011
>Closed-Date:    
>Last-Modified:  Tue May 24 03:03:21 UTC 2011
>Originator:     Gene Stark
>Release:        8.0-RELEASE
>Organization:
>Environment:
FreeBSD home.starkeffect.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #10: Fri Jul 16 12:32:08 EDT 2010     root@home.starkeffect.com:/huge/src/sys/i386/compile/STARKHOME-SMP_8_0  i386

>Description:
I made an 18 gig dump of a (gvinum) filesystem using the command:
"dump 0f - /dev/gvinum/A > A.dump".  There were no problems reported during
the dump, and the volume fsck'ed clean beforehand.  I newfs'ed the volume and
attempted to restore via "restore rf - < /A.dump" and it failed with the
error message: unknown tape header type -230747966.  This was quite irritating,
as I have grown to trust dump/restore over many years and due to the size
I had already destroyed the original volume without trying to to read through
the dump file with restore.

I spent substantial time analyzing the dump to try to determine the failure
mode.  It turns out that header blocks actually occur out-of-order in
the dump file, as indicated by comparing the actual offset of the block
in the dump file with the spcl.c_tapea fields of the headers.
Once the problem started (during a large file near the beginning of the
dump), the difference between the actual offset (in units of TP_BSIZE)
and the claimed offset in the spcl.c_tapea block was either -10, 0, or 20.
That is, sometimes the header blocks came earlier than expected, sometimes
they came on time, and sometimes they came later, and there were only a few
possibilities.

I wrote a program to read the dump records and reorder them so that the
headers were emitted at their claimed offsets.  This was done by queueing
the headers and data blocks separately, emitting headers when they were due,
and emitting data blocks otherwise.  This program could then verify that
the correct number of data blocks were present to match the information in
the headers.  However, when I pipe the reordered block stream into restore,
there are still some issues.  For one thing, there is no way to verify the
order of the data blocks, and it seems that reordering might also have
occurred on those.  I have other copies of some of the large files that
were in the dump, and I will attempt to determine how the data blocks
have been reordered, but I haven't done that yet.

I was at a loss to explain how this kind of reordering could have occurred,
until I read some of the source to dump and saw that it is using multiple
processes to write the dump file.  I am running on a 2-core system (4 CPUs
after hyperthreading).  I strongly suspect a concurrency issue in the way
the dump tape is written, otherwise I don't see how the header blocks could
have been reordered in the way I observed.

>How-To-Repeat:
Although I have (unfortunately) already destroyed the original filesystem,
I was able to repeat the behavior on another filesystem using the following
command:

home# dump 0f - /mail | restore rfN -
  DUMP: Date of this level 0 dump: Sat May 21 22:57:40 2011
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/gvinum/mail_new (/mail) to standard output
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 11291623 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
unknown tape header type 1781888358
abort? [yn] y
dump core? [yn] n
  DUMP: Broken pipe
  DUMP: The ENTIRE dump is aborted.

This problem really needs to be looked into, because it is a disaster to
create an apparently successful dump with the idea of doing a simple
filesystem volume rebuild and then find out that it fails in the restore.

The reordering of the dump stream to put the header blocks back in their
proper positions helps quite a bit, but I have not been able to recover
my data at this time, because the data blocks are also apparently
reordered.  If there is a systematic mechanism to the reordering, I might
still be able to recover, but if it is a concurrency/synchronization thing
it might well be hopeless.


>Fix:


>Release-Note:
>Audit-Trail:

From: Gene Stark <gene@home.starkeffect.com>
To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
Cc:  
Subject: Re: bin/157244: dump/restore: unknown tape header type  -230747966
Date: Sun, 22 May 2011 08:33:23 -0400

 I wrote a program to compare the blocks in another copy of one of the
 large files in the dump with the version extracted from restore after
 applying my header reordering program.  The program read each of the
 files in blocks of TP_BSIZE bytes, computed the SHA1 hash of each
 block, stored the resulting <hash, offset> pairs in a hash map for
 each file, unioned the key sets of the two hash maps to obtain a
 single master list of block hashes, traversed the master key set
 to construct a map <offset, <offset0, offset1>> that gave the
 correspondence between the blocks in the two files, and printed out
 the contents of that map in increasing order of offset, showing the
 differences between the two files.  Here is the initial part of the
 result:
 
 Lectures.zip.bad: 52469795 bytes
 Lectures.zip.good: 52469795 bytes
 11612   11622   10
 11613   11623   10
 11614   11624   10
 11615   11625   10
 11616   11626   10
 11617   11627   10
 11618   11628   10
 11619   11629   10
 11620   11630   10
 11621   11631   10
 11622   11632   10
 11623   11633   10
 11624   11634   10
 11625   11635   10
 11626   11636   10
 11627   11637   10
 11628   11638   10
 11629   11639   10
 11630   11640   10
 11631   11641   10
 11632   11612   -20
 11633   11613   -20
 11634   11614   -20
 11635   11615   -20
 11636   11616   -20
 11637   11617   -20
 11638   11618   -20
 11639   11619   -20
 11640   11620   -20
 11641   11621   -20
 11642   11652   10
 11643   11653   10
 11644   11654   10
 11645   11655   10
 11646   11656   10
 11647   11657   10
 11648   11658   10
 11649   11659   10
 11650   11660   10
 11651   11661   10
 11652   11662   10
 11653   11663   10
 11654   11664   10
 11655   11665   10
 11656   11666   10
 11657   11667   10
 11658   11668   10
 11659   11669   10
 11660   11670   10
 11661   11671   10
 11662   11642   -20
 11663   11643   -20
 11664   11644   -20
 11665   11645   -20
 11666   11646   -20
 11667   11647   -20
 11668   11648   -20
 11669   11649   -20
 11670   11650   -20
 11671   11651   -20
 11672   11682   10
 11673   11683   10
 
 The pattern repeats this way for *almost* the entire file.
 There are sets of 20 blocks that occur 10 blocks ahead of the
 corresponding blocks in the other file, and then a set of 10
 blocks that occur 20 blocks behind the corresponding blocks
 in the other file.  There are occasional values of 9 and 19
 for the differences, which I don't have a ready explanation for,
 except that my header reordering relied on the magic number
 to identify the header blocks and it is possible there were
 a few blocks that were misidentified as headers that were actually
 data blocks.  At the end of the files there are a few blocks
 that do not correspond; these are probably due to alignment
 at the end which caused some of the last data blocks to be used
 as the first blocks for the next file in the dump.
 
 To test my suspicion that it is a concurrency issue in dump,
 I recompiled dump after setting #define SLAVES 1 in tape.c
 (rather than the value 3 it had before).  I then was able to
 complete two rounds of "dump 0f - /mail | restore rfN -"
 without any errors, whereas if I use /sbin/dump it fails out
 very quickly as indicated in the original PR.
 
 I am not familiar with the locking features, etc. being used in
 dump, so I don't know if I will be able to go farther than this
 with a reasonable expenditure of time.  However, I strongly
 suggest that the "concurrency modifications" in dump be turned
 off (perhaps by setting SLAVES to 1 as I did) until somebody
 can get to the bottom of this.  If this is happening to me,
 then I suspect there are *massive* numbers of bad dumps out there
 that people think are actually good.  It will really be a rude
 awakening when people try to read them back.  Since the data
 blocks don't contain any tape address information in them,
 it is not possible to recover.
 
>Unformatted:
