From nobody@FreeBSD.org  Thu Jul 15 22:58:45 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 74F5A1065675
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Jul 2010 22:58:45 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 63BB68FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Jul 2010 22:58:45 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o6FMwjBq026819
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 15 Jul 2010 22:58:45 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o6FMwjPX026818;
	Thu, 15 Jul 2010 22:58:45 GMT
	(envelope-from nobody)
Message-Id: <201007152258.o6FMwjPX026818@www.freebsd.org>
Date: Thu, 15 Jul 2010 22:58:45 GMT
From: Emil Smolenski <am@raisa.eu.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [zfs] Booting from a degraded raidz no longer works in 8-STABLE [regression]
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         148655
>Category:       kern
>Synopsis:       [zfs] Booting from a degraded raidz no longer works in 8-STABLE [regression]
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    mm
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jul 15 23:00:04 UTC 2010
>Closed-Date:    Mon Sep 06 11:59:53 UTC 2010
>Last-Modified:  Mon Sep 06 11:59:53 UTC 2010
>Originator:     Emil Smolenski
>Release:        FreeBSD 8-STABLE
>Organization:
>Environment:
FreeBSD 8.0-STABLE i386
>Description:
After upgrade 8.0-RELEASE system to 8-STABLE, the machine no longer boots from degraded raidz. Booting stops with a message:

error 1 lba 32
error 1 lba 1

Booting from a non-degraded raidz still works fine.

After encountering a problem I prepared an isolated (qemu) environment to test this issue.

In 8.0-RELEASE everything works fine. I can remove one disk and the system boots. My configuration:

- gpart config:
=>     34  6291389  ad0  GPT  (3.0G)
       34      128    1  freebsd-boot  (64K)
      162  6291261    2  freebsd-zfs  (3.0G)

=>     34  6291389  ad1  GPT  (3.0G)
       34      128    1  freebsd-boot  (64K)
      162  6291261    2  freebsd-zfs  (3.0G)

=>     34  6291389  ad3  GPT  (3.0G)
       34      128    1  freebsd-boot  (64K)
      162  6291261    2  freebsd-zfs  (3.0G)

(with GPT labeling)

- zpool (v13) config:
        NAME            STATE     READ WRITE CKSUM
        bijou           ONLINE       0     0     0
          raidz1        ONLINE       0     0     0
            gpt/bijou0  ONLINE       0     0     0
            gpt/bijou1  ONLINE       0     0     0
            gpt/bijou2  ONLINE       0     0     0

- loader.conf:
zfs_load="YES"
vfs.root.mountfrom="zfs:bijou"

- rc.conf:
zfs_enable="YES"

After upgrade to 8-STABLE (world and kernel) and BEFORE reinstalling bootcode booting from degraded raidz no longer works: http://img28.imageshack.us/img28/9118/raidz8stable.png . Note: the installworld upgrades /boot/loader.

After reinstalling bootcode:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad3

booting stops with a message:

error 1 lba 32
error 1 lba 1

( http://img17.imageshack.us/img17/36/raidz8stablereinstalled.png )

Upgrading zpool to v14 doesn't help. I think this is serious regression -- disks often fail during reboot. The whole idea of raidz doesn't make any sense.
>How-To-Repeat:
1. Have FreeBSD 8.0-RELEASE "RootOnZFS" raidz1 installation (like the one described here: http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/RAIDZ1 )
2. Upgrade to 8-STABLE.
3. Physically remove one disk from raidz.
4. Boot the machine.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Fri Jul 16 11:17:49 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148655 

From: Scott Johnson <scottj75074@yahoo.com>
To: bug-followup@FreeBSD.org, am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works in 8-STABLE [regression]
Date: Sun, 1 Aug 2010 13:00:26 -0700 (PDT)

 --0-22506440-1280692826=:68914
 Content-Type: text/plain; charset=us-ascii
 
 Another data point. I had a similar problem with mirrored boot disks.
 
 I installed 8.1-RELEASE onto two identical drives in a mirror, following the 
 "Installing FreeBSD Root on ZFS (Mirror) using GPT" guide here: 
 http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror
 
 I can boot from either disk by changing the boot priority in the BIOS, so long 
 as both disks are connected. I swapped the sata ports, and I can still boot from 
 
 either disk.
 
 I can disconnect drive #2 and boot from drive #1 just fine, whether drive #1 is 
 plugged into sata0 or sata1.
 
 However when I disconnect drive #1 and try to boot from drive #2, on either 
 sata0 or sata1, I get errors:
 
 error 1 lba 32
 error 1 lba 1
 error 1 lba 32
 error 1 lba 1
 error 1 lba 32
 error 1 lba 1
 error 1 lba 32
 error 1 lba 1
 error 1 lba 32
 error 1 lba 1
 error 1 lba 32
 error 1 lba 1
 No ZFS pools located, can't boot
 --0-22506440-1280692826=:68914
 Content-Type: text/html; charset=us-ascii
 
 <html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:arial,helvetica,sans-serif;font-size:12pt"><div>Another data point. I had a similar problem with mirrored boot disks.<br><br>I installed 8.1-RELEASE onto two identical drives in a mirror, following the <br>"Installing FreeBSD Root on ZFS (Mirror) using GPT" guide here: <br><a href="http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror" target="_blank"><span class="yshortcuts" id="lw_1280688999_0">http:// wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror</span></a><br><br>I can boot from either disk by changing the boot priority in the BIOS, so long <br>as both disks are connected. I swapped the sata ports, and I can still boot from <br>either disk.<br><br>I can disconnect drive #2 and boot from drive #1 just fine, whether drive #1 is <br>plugged into sata0 or sata1.<br><br>However when I disconnect drive #1 and try to boot from drive #2, on either <br>sata0
  or sata1, I get errors:<br><br>error 1 lba 32<br>error 1 lba 1<br>error 1 lba 32<br>error 1 lba 1<br>error 1 lba 32<br>error 1 lba 1<br>error 1 lba 32<br>error 1 lba 1<br>error 1 lba 32<br>error 1 lba 1<br>error 1 lba 32<br>error 1 lba 1<br>No ZFS pools located, can't boot<br><br></div>
 </div></body></html>
 --0-22506440-1280692826=:68914--

From: George Kontostanos <gkontos.mail@gmail.com>
To: bug-followup@FreeBSD.org,
 am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works in 8-STABLE [regression]
Date: Mon, 2 Aug 2010 00:36:51 +0300

 Hi,
 
 I was able to reproduce this on a VM with ZFS GPT mirror setup running =
 also 8.1 release.
 
 Regards,
 
 George=

From: Dan Naumov <dan.naumov@gmail.com>
To: bug-followup@FreeBSD.org, am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works 
	in 8-STABLE [regression]
Date: Mon, 2 Aug 2010 20:06:24 +0300

 A few more people are also confirming this issue here:
 http://forums.freebsd.org/showthread.php?t=16556

From: Lars Flatmo <larghio@gmail.com>
To: bug-followup@FreeBSD.org,
 am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works in 8-STABLE [regression]
Date: Wed, 4 Aug 2010 12:23:36 +0200

 I have the exact same problem with fresh installed 8.1. also reproduced =
 it in vmware fusion=

From: Andriy Gapon <avg@icyb.net.ua>
To: bug-followup@FreeBSD.org, am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Wed, 04 Aug 2010 14:33:08 +0300

 I would like to ask those who can reproduce the problem to try to use head
 (CURRENT) version, manually compile sys/boot/zfs/zfstest.c and try to use it to
 debug what exactly fails.
 
 -- 
 Andriy Gapon

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org, Andriy Gapon <avg@icyb.net.ua>
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 10:19:37 +0200

 I can confirm this behaviour, easily reproducable in virtualbox.
 
 The problem must be in the internal logic of zfsboot, because the lba
 errors reported are from function:
 
 drvread() in sys/boot/i386/zfsboot/zfsboot.c, line #1079:
     if (V86_CY(v86.efl)) {
         printf("error %u lba %u\n", v86.eax >> 8 & 0xff, lba);
         return -1;
     }
 
 drvread() is called from vdev_read() (line #315)

From: Andriy Gapon <avg@icyb.net.ua>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 11:51:23 +0300

 on 05/08/2010 11:19 Martin Matuska said the following:
 > I can confirm this behaviour, easily reproducable in virtualbox.
 > 
 > The problem must be in the internal logic of zfsboot, because the lba
 > errors reported are from function:
 > 
 > drvread() in sys/boot/i386/zfsboot/zfsboot.c, line #1079:
 >     if (V86_CY(v86.efl)) {
 >         printf("error %u lba %u\n", v86.eax >> 8 & 0xff, lba);
 >         return -1;
 >     }
 > 
 > drvread() is called from vdev_read() (line #315)
 
 Right, and this is the reason why I asked to try zfstest, because it would be
 interesting to see the whole stack trace to determine which high-level zfs
 operation fails.
 Thanks!
 -- 
 Andriy Gapon

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org, Andriy Gapon <avg@icyb.net.ua>
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 13:53:44 +0200

 So I have done more code reading and debugging with mfsBSD in virtualbox
 and I came to the following conclusion:
 
 sys/boot/zfs/zfsimpl.c reads vdev information from the pool but there is
 no check if these vdevs do exist on physical devices. In other words, if
 the pool has last seen its vdevs as HEALTHY, gptzfsboot assumes all of
 them are available.
 
 So this way e.g. in case of a mirror, the vdev_mirror_read() tries to
 read from the first "healthy" vdev in its list. If the first vdev is the
 missing vdev (e.g. a disconnected or failed drive), it just cannot read
 from it so you are unable to boot.
 
 In my test setup, vdev_mirror_read() reported two healty kids and tried
 to read from the non-existing vdev.
 
 I think in the boot case, we should first scan for all physically
 available vdevs, then scan for children from their configuration. All
 child vdevs that cannot be physically opened (do not have a
 representation from the previous scan) should be set to state
 VDEV_STATE_CANT_OPEN and not assumed as VDEV_STATE_HEALTHY.

From: Andriy Gapon <avg@icyb.net.ua>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 15:21:06 +0300

 on 05/08/2010 14:53 Martin Matuska said the following:
 > In my test setup, vdev_mirror_read() reported two healty kids and tried
 > to read from the non-existing vdev.
 
 What happened next?
 If I read vdev_mirror_read() code correctly it should continue to the next device
 if reading from the current device fails.
 
 -- 
 Andriy Gapon

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org, am@raisa.eu.org
Cc: Andriy Gapon <avg@icyb.net.ua>, Pawel Jakub Dawidek <pjd@freebsd.org>, 
 Xin LI <delphij@freebsd.org>
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 18:23:43 +0200

 This is a multi-part message in MIME format.
 --------------050900020709060502070203
 Content-Type: text/plain; charset=windows-1250
 Content-Transfer-Encoding: 7bit
 
 A proposed patch is attached.
 
 The function vdev_read_phys() (sys/boot/zfs/zfsimpl.c, #325) does call
 vdev->v_phys_read() without checking if that function is registered.
 
 This check should be done in vdev_read_phys before doing anything else.
 
 vdev_create initializes vdev->v_phys_read as 0 and unavailable vdevs
 keep this value.
 
 --------------050900020709060502070203
 Content-Type: text/plain;
  name="head-zfsimpl.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="head-zfsimpl.c.patch"
 
 Index: sys/boot/zfs/zfsimpl.c
 ===================================================================
 --- sys/boot/zfs/zfsimpl.c	(revision 210854)
 +++ sys/boot/zfs/zfsimpl.c	(working copy)
 @@ -328,6 +328,9 @@
  	size_t psize;
  	int rc;
  
 +	if (!vdev->v_phys_read)
 +		return (EIO);
 +
  	if (bp) {
  		psize = BP_GET_PSIZE(bp);
  	} else {
 
 --------------050900020709060502070203--

From: Andriy Gapon <avg@icyb.net.ua>
To: Martin Matuska <mm@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, am@raisa.eu.org,
        Pawel Jakub Dawidek <pjd@FreeBSD.org>, Xin LI <delphij@FreeBSD.org>
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Thu, 05 Aug 2010 19:38:17 +0300

 on 05/08/2010 19:23 Martin Matuska said the following:
 > A proposed patch is attached.
 > 
 > The function vdev_read_phys() (sys/boot/zfs/zfsimpl.c, #325) does call
 > vdev->v_phys_read() without checking if that function is registered.
 > 
 > This check should be done in vdev_read_phys before doing anything else.
 > 
 > vdev_create initializes vdev->v_phys_read as 0 and unavailable vdevs
 > keep this value.
 
 Looks very good.
 Thanks!
 
 -- 
 Andriy Gapon
Responsible-Changed-From-To: freebsd-fs->mm 
Responsible-Changed-By: mm 
Responsible-Changed-When: Sun Aug 8 17:53:09 UTC 2010 
Responsible-Changed-Why:  
I am taking this PR. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148655 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/148655: commit references a PR
Date: Mon,  9 Aug 2010 06:36:23 +0000 (UTC)

 Author: mm
 Date: Mon Aug  9 06:36:11 2010
 New Revision: 211091
 URL: http://svn.freebsd.org/changeset/base/211091
 
 Log:
   Return EIO if vdev->v_phys_read is NULL.
   
   This fixes booting from a ZFS mirror with a unavailable primary device.
   
   PR:		kern/148655
   Reviewed by:	avg
   Approved by:	delphij (mentor)
   MFC after:	3 days
 
 Modified:
   head/sys/boot/zfs/zfsimpl.c
 
 Modified: head/sys/boot/zfs/zfsimpl.c
 ==============================================================================
 --- head/sys/boot/zfs/zfsimpl.c	Mon Aug  9 06:02:23 2010	(r211090)
 +++ head/sys/boot/zfs/zfsimpl.c	Mon Aug  9 06:36:11 2010	(r211091)
 @@ -328,6 +328,9 @@ vdev_read_phys(vdev_t *vdev, const blkpt
  	size_t psize;
  	int rc;
  
 +	if (!vdev->v_phys_read)
 +		return (EIO);
 +
  	if (bp) {
  		psize = BP_GET_PSIZE(bp);
  	} else {
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: "Emil Smolenski" <am@raisa.eu.org>
To: "Martin Matuska" <mm@freebsd.org>, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Mon, 09 Aug 2010 23:47:09 +0200

 On Thu, 05 Aug 2010 18:23:43 +0200, Martin Matuska <mm@freebsd.org> wrote:
 
 > A proposed patch is attached.
 
 It works for me on 8.1-PRERELEASE. Many thanks! (Sorry for the delay, I  
 was on a short vacation.)
 
 -- 
 am

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/148655: commit references a PR
Date: Thu, 12 Aug 2010 11:04:28 +0000 (UTC)

 Author: mm
 Date: Thu Aug 12 05:59:55 2010
 New Revision: 211205
 URL: http://svn.freebsd.org/changeset/base/211205
 
 Log:
   MFC r211091:
   
   Return EIO if vdev->v_phys_read is NULL.
   
   This fixes booting from a ZFS mirror with a unavailable primary device.
   
   PR:		kern/148655
   Reviewed by:	avg
   Approved by:	delphij (mentor)
 
 Modified:
   stable/8/sys/boot/zfs/zfsimpl.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cam/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/boot/zfs/zfsimpl.c
 ==============================================================================
 --- stable/8/sys/boot/zfs/zfsimpl.c	Thu Aug 12 01:08:50 2010	(r211204)
 +++ stable/8/sys/boot/zfs/zfsimpl.c	Thu Aug 12 05:59:55 2010	(r211205)
 @@ -328,6 +328,9 @@ vdev_read_phys(vdev_t *vdev, const blkpt
  	size_t psize;
  	int rc;
  
 +	if (!vdev->v_phys_read)
 +		return (EIO);
 +
  	if (bp) {
  		psize = BP_GET_PSIZE(bp);
  	} else {
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Dan Naumov <dan.naumov@gmail.com>
To: bug-followup@FreeBSD.org, am@raisa.eu.org
Cc:  
Subject: Re: kern/148655: [zfs] Booting from a degraded raidz no longer works
 in 8-STABLE [regression]
Date: Mon, 30 Aug 2010 22:27:15 +0300

 So is this going into an errata patch for 8.1 anytime soon? My 8.0
 system is a "root on zfs mirror", so this bug is pretty much holding
 me from upgrading to 8.1.
 
 
 - Sincerely,
 Dan Naumov
State-Changed-From-To: open->closed 
State-Changed-By: mm 
State-Changed-When: Mon Sep 6 11:59:52 UTC 2010 
State-Changed-Why:  
Fixed in 9-CURRENT and 8-STABLE. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148655 
>Unformatted:
