From bruce@ecserv1.uwaterloo.ca  Sat Mar 22 19:03:39 2003
Return-Path: <bruce@ecserv1.uwaterloo.ca>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BD65437B401
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 22 Mar 2003 19:03:39 -0800 (PST)
Received: from ecserv1.uwaterloo.ca (ecserv1.uwaterloo.ca [129.97.50.121])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2213543F3F
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 22 Mar 2003 19:03:39 -0800 (PST)
	(envelope-from bruce@ecserv1.uwaterloo.ca)
Received: from ecserv1.uwaterloo.ca (localhost.uwaterloo.ca [127.0.0.1])
	by ecserv1.uwaterloo.ca (8.12.6/8.12.6) with ESMTP id h2N33faC075485;
	Sat, 22 Mar 2003 22:03:41 -0500 (EST)
	(envelope-from bruce@ecserv1.uwaterloo.ca)
Received: (from bruce@localhost)
	by ecserv1.uwaterloo.ca (8.12.6/8.12.6/Submit) id h2N33fZ6075484;
	Sat, 22 Mar 2003 22:03:41 -0500 (EST)
Message-Id: <200303230303.h2N33fZ6075484@ecserv1.uwaterloo.ca>
Date: Sat, 22 Mar 2003 22:03:41 -0500 (EST)
From: Bruce Campbell <bruce@engmail.uwaterloo.ca>
Reply-To: Bruce Campbell <bruce@engmail.uwaterloo.ca>
To: FreeBSD-gnats-submit@freebsd.org
Cc: bruce@engmail.uwaterloo.ca
Subject: 3ware RAID 5 resulting in data corruption
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         50201
>Category:       kern
>Synopsis:       [twe] 3ware RAID 5 resulting in data corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Mar 22 19:10:05 PST 2003
>Closed-Date:    Thu Mar 29 05:48:15 GMT 2007
>Last-Modified:  Thu Mar 29 05:48:15 GMT 2007
>Originator:     Bruce Campbell
>Release:        FreeBSD 4.7-RELEASE i386
>Organization:
University of Waterloo
>Environment:
System: FreeBSD ecserv1.uwaterloo.ca 4.7-RELEASE FreeBSD 4.7-RELEASE #0: Wed Oct 9 15:08:34 GMT 2002 root@builder.freebsdmall.com:/usr/obj/usr/src/sys/GENERIC i386
>Description:
A system with a 3ware 7500-8 card, and 6 Western Digital
 200 GB drives, RAID 5, experiences data corruption repeatably.

Complete details are at:

http://www.freebsd.uwaterloo.ca/twiki/bin/view/Freebsd/BackupServerProblem

The 3ware card has up to date firmware.  3ware tech support advises:

"We do not support FreeBSD plus the current driver for FreeBSD has not been updated for some time to keep up with firmware changes. 

Please try linux instead"

The author of the driver, Michael Smith, advises he has not worked on the driver
in a number of years.

There are reports from linux and Windows users reporting corruption when
drivers and firmware were not in sync.

>How-To-Repeat:

Install system, and copy 250GB of data or so to the partition, and then
compare the data to the originals, and some of the files show
corruption, on 16K boundaries.

>Fix:
The use of RAID 10 instead of RAID 5 works fine.  The use
of RAID 5 with 3wares Write Cache turned off works fine (but slowly
on writes)

>Release-Note:
>Audit-Trail:

From: Bruce Campbell <bruce@scimail.uwaterloo.ca>
To: David Kirchner <dpk@dpk.net>
Cc: bug-followup@FreeBSD.org, Michael Hurst <mshurst@engmail.uwaterloo.ca>
Subject: Re: [engmail] Re: kern/50201: [twe] 3ware RAID 5 resulting in data corruption
Date: Wed, 09 Nov 2005 16:28:00 -0500

 Quoting David Kirchner <dpk@dpk.net>:
 
 > I'm wondering if this bug is still in the system. I've personally been
 > using 3ware cards on 4.5 machines and beyond without data corruption:
 > RAID5 w/ WC enabled and other RAID configurations (1, 10).
 > 
 > The driver has also been updated relatively recently. Bruce, I'm not
 > suggesting tha this bug is definitely fixed, but I'm curious, are you
 > still having trouble with RAID5? If not, can we close this bug?
 > 
 
 I left the department that owned the 2 systems where we had troubles.
 I will ask the admin for them (Michael Hurst).
 
 -- 
 Bruce Campbell
 Manager, Science Computing
 C2-260
 University of Waterloo
 (519)888-4567 ext 6991
 
 ----------------------------------------
 This mail sent through www.mywaterloo.ca

From: David Kirchner <dpk@dpk.net>
To: bug-followup@FreeBSD.org, bruce@engmail.uwaterloo.ca
Cc:  
Subject: Re: kern/50201: [twe] 3ware RAID 5 resulting in data corruption
Date: Wed, 9 Nov 2005 12:16:16 -0800

 I'm wondering if this bug is still in the system. I've personally been
 using 3ware cards on 4.5 machines and beyond without data corruption:
 RAID5 w/ WC enabled and other RAID configurations (1, 10).
 
 The driver has also been updated relatively recently. Bruce, I'm not
 suggesting tha this bug is definitely fixed, but I'm curious, are you
 still having trouble with RAID5? If not, can we close this bug?
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Thu Nov 10 14:48:33 GMT 2005 
State-Changed-Why:  
Feedback requested. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=50201 

From: Jan Srzednicki <w@wrzask.pl>
To: bug-followup@freebsd.org, bruce@engmail.uwaterloo.ca, dpk@dpk.net
Cc:  
Subject: Re: kern/50201: [twe] 3ware RAID 5 resulting in data corruption
Date: Sat, 26 Nov 2005 16:58:36 +0100

 I'm experiencing a similar problem, though with a few notable
 differences.
 
 First of all, I'm running FreeBSD 5.4-RELEASE (with RELENG_5_4 fixes) on
 my machine. Here's a brief output from my dmesg related to the 3ware
 controller:
 
 [16:32] hostname:~ # dmesg | grep twe
 twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0xcc00-0xcc0f mem 0xfe000000-0xfe7fffff irq 21 at device 0.0 on pci2
 twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
 twed0: <Unit 0, RAID5, Normal> on twe0
 twed0: 1192370MB (2441975040 sectors)
 
 The controller is a 7000-class 8-way RAID controller with PATA
 interfaces.
 
 I'm experiencing repeatable data corruption, but it's was far more
 difficult to pin it down. I'm using the array for backups, which I'm
 doing via ssh over the network (100Mbit ethernet) in the following way:
 
 dump | gzip | md5checker | network(ssh) | md5checker | split twe0/files
 
 md5checker is my small utility to calculate md5 sums of each 1MB chunk
 of data piped through it. It assured me that data corruption does not
 occur on the network, as MD5 sums on each sides match each other. The
 total size of backuped data after gzipping sums to about 43GB.
 
 The strange thing was that performing _the same_ backup in the following
 way:
 
 dump | gzip > file
 cat file | md5checker | network(ssh) | md5checker | split twe0/files
 
 .. did not produce any errors (I repeated both "ways" several times, to
 make sure). Well, it appears that the data corruption is somehow related
 to the speed of the data transmition, as dump output is quite irregular
 and becomes rather slow when it hits a bunch of small files. The whole
 dump process takes about 6 hours. 
 
 I tried dumping the data into an IDE disk on the machine with the
 controller, which resulted in no errors. I also tried turning off
 softupdates on the filesystem on the 3ware array, with no effect. It
 clearly appears the data corruption is somehow related to the 3ware
 controller.
 
 After some investigation, I've discovered the following facts:
  - data is corrupted in exact 128kB chunks; the whole 128kB is bad and
    appears to be random (that is, I could not find any similar chunk in
    other files on the partition).
  - errors are pretty rare; in the whole 43GB stream I'm getting about 3
    or 4 errors.
  - I'm not able to repeat data corruption locally. Things like:
  
 	cat /dev/(zero|urandom) | md5checker | split array/files
   
    .. did not produce _any_ errors, after piping about a terabyte of
    data.
 
 It also appears that turning off write-cache on the controller fixed the
 problem, but writes are very slow now.
 
 I don't have another 3ware controller, so I cannot check if it isn't a
 hardware issue within it.
 
 I'm of course willing to provide any feedback needed on that issue, but
 because of the duration of the process testing stuff is rather slow.
 
State-Changed-From-To: feedback->open 
State-Changed-By: linimon 
State-Changed-When: Sun Dec 4 02:21:01 GMT 2005 
State-Changed-Why:  
Feedback received, although not from original submitter.  It seems as though 
the problem persists. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=50201 
Responsible-Changed-From-To: freebsd-bugs->vkashyap 
Responsible-Changed-By: ceri 
Responsible-Changed-When: Wed May 10 15:39:42 UTC 2006 
Responsible-Changed-Why:  
Assign to maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=50201 
Responsible-Changed-From-To: vkashyap->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed May 24 19:46:20 UTC 2006 
Responsible-Changed-Why:  
vkashyap has passed maintainence of twe to aradford@amcc.com. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=50201 

From: Harrison Grundy <astrodog@gmail.com>
To: bug-followup@FreeBSD.org,  bruce@engmail.uwaterloo.ca
Cc:  
Subject: Re: kern/50201: [twe] 3ware RAID 5 resulting in data corruption
Date: Wed, 21 Mar 2007 22:54:07 -0500

 Does this issue still occur? If so, have you tried updating the firmware 
 of your controller, and seeing if that fixes it? (I've run into some 
 oddball issues on 3ware controllers if I don't keep the firmware current).
State-Changed-From-To: open->closed 
State-Changed-By: remko 
State-Changed-When: Thu Mar 29 05:48:14 UTC 2007 
State-Changed-Why:  
feedback timeout 

http://www.freebsd.org/cgi/query-pr.cgi?pr=50201 
>Unformatted:
