From nobody@FreeBSD.org  Thu Mar 20 06:25:08 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BCCAC1065672
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Mar 2008 06:25:08 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id AA7A78FC25
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Mar 2008 06:25:08 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m2K6P75l090172
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Mar 2008 06:25:07 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m2K6P7TG090171;
	Thu, 20 Mar 2008 06:25:07 GMT
	(envelope-from nobody)
Message-Id: <200803200625.m2K6P7TG090171@www.freebsd.org>
Date: Thu, 20 Mar 2008 06:25:07 GMT
From: Stef Walter <stef@memberwebs.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Drive detached from Intel Matrix RAID and returned comes up as entirely new ataraid
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         121899
>Category:       kern
>Synopsis:       [ar] [patch] Drive detached from Intel Matrix RAID and returned comes up as entirely new ataraid
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 20 06:30:01 UTC 2008
>Closed-Date:    Fri Mar 12 07:15:52 UTC 2010
>Last-Modified:  Fri Mar 12 07:15:52 UTC 2010
>Originator:     Stef Walter
>Release:        FreeBSD 6.3 and FreeBSD 7.0
>Organization:
None
>Environment:
FreeBSD new1.web.ws.local 6.3-RELEASE-p1 FreeBSD 6.3-RELEASE-p1 #10: Wed Mar 19 17:05:41 UTC 2008     root@new1.web.local:/usr/obj/usr/src/sys/RACK1  i386
>Description:
Note: This pertains to ataraid RAID devices using the Intel MatrixRAID
hardware.

A drive that was once part of an ataraid device, when added back to the
machine that it was on, shows up as a new ataraid device. This new ataraid
device tries to use all the drives that were originally in the RAID.
Results can range from a confusion to a real mess. 

>How-To-Repeat:
 1. Build a RAID1 device with two drives on Intel MatrixRAID hardware.
    This creates 'ar0'

    # atacontrol create RAID1 ad4 ad6
    ad0 created

 2. Shutdown the machine, and remove ad6. Or if you really want to
    simulate a failure, jerk it from its socket :)

 3. When the machine restarts (it'll panic unless you apply patch on
    pr/102211) you'll see the RAID is degraded:

    # atacontrol status ar0
    ar0: ATA RAID1 status: DEGRADED
      subdisks:
        0 ad4   DOWN
        1 ----- MISSING

  4. Reattach the new drive and a new raid 'ar1' will appear with ad6.
     It tries to use ad4 as well, but its already in use by 'ar0'.

>Fix:
Don't rewrite the config_id of the RAID every time something changes.
That's what the generation is for. The config_id should remain the same
for the lifetime of the RAID. We need to be diligent about incrementing
the generation whenever the RAID status changes, including on boot in
case of a DEGRADED array.

This fix causes the ad6 (in the example above) to be recognized correctly
as an out of date member of an already present RAID.



Patch attached with submission follows:

--- sys/dev/ata/ata-raid.c.orig	2008-03-19 11:20:15.000000000 +0000
+++ sys/dev/ata/ata-raid.c	2008-03-19 21:53:37.000000000 +0000
@@ -848,10 +848,17 @@
 	rdp->status &= ~AR_S_READY;
     }
 
+    /* 
+     * Note that when the array breaks so comes up broken we 
+     * force a write of the array config to the remaining 
+     * drives so that the generation will be incremented past 
+     * those of the missing or failed drives (in all cases).
+     */
     if (rdp->status != status) {
 	if (!(rdp->status & AR_S_READY)) {
 	    printf("ar%d: FAILURE - %s array broken\n",
 		   rdp->lun, ata_raid_type(rdp));
+            writeback = 1;
 	}
 	else if (rdp->status & AR_S_DEGRADED) {
 	    if (rdp->type & (AR_T_RAID1 | AR_T_RAID01))
@@ -860,6 +867,7 @@
 		printf("ar%d: WARNING - parity", rdp->lun);
 	    printf(" protection lost. %s array in DEGRADED mode\n",
 		   ata_raid_type(rdp));
+            writeback = 1;
 	}
     }
     mtx_unlock(&rdp->lock);
@@ -2233,11 +2242,16 @@
     }
 
     rdp->generation++;
-    microtime(&timestamp);
+
+    /* Generate a new config_id if none exists */
+    if (!rdp->magic_0) {
+        microtime(&timestamp);
+	rdp->magic_0 = timestamp.tv_sec ^ timestamp.tv_usec;
+    } 
 
     bcopy(INTEL_MAGIC, meta->intel_id, sizeof(meta->intel_id));
     bcopy(INTEL_VERSION_1100, meta->version, sizeof(meta->version));
-    meta->config_id = timestamp.tv_sec;
+    meta->config_id = rdp->magic_0;
     meta->generation = rdp->generation;
     meta->total_disks = rdp->total_disks;
     meta->total_volumes = 1;                                    /* XXX SOS */


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: mav 
State-Changed-When: Fri Mar 12 07:10:24 UTC 2010 
State-Changed-Why:  
Patch committed two years ago at r177452 and present on 8-STABLE. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=121899 
>Unformatted:
