From olgeni@FreeBSD.org  Thu Jul  1 16:09:37 2010
Return-Path: <olgeni@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E7ECB1065670;
	Thu,  1 Jul 2010 16:09:37 +0000 (UTC)
	(envelope-from olgeni@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BDBC68FC14;
	Thu,  1 Jul 2010 16:09:37 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o61G9axh057521;
	Thu, 1 Jul 2010 16:09:36 GMT
	(envelope-from olgeni@freefall.freebsd.org)
Received: (from olgeni@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o61G9aC7057520;
	Thu, 1 Jul 2010 16:09:36 GMT
	(envelope-from olgeni)
Message-Id: <201007011609.o61G9aC7057520@freefall.freebsd.org>
Date: Thu, 1 Jul 2010 16:09:36 GMT
From: Jimmy Olgeni <olgeni@freebsd.org>
Reply-To: Jimmy Olgeni <olgeni@freebsd.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc: rnoland@freebsd.org
Subject: [zfs] [loader] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         148296
>Category:       bin
>Synopsis:       [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-fs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jul 01 16:10:06 UTC 2010
>Closed-Date:    Thu Sep 15 20:44:35 UTC 2011
>Last-Modified:  Thu Sep 15 20:44:35 UTC 2011
>Originator:     Jimmy Olgeni
>Release:        FreeBSD 8.1-RC1 i386
>Organization:
>Environment:
System: FreeBSD backoffice 8.1-RC1 FreeBSD 8.1-RC1 #0: Fri Jun 25 21:42:58 CEST 2010 root@backoffice:/usr/obj/usr/src/sys/RELENG_8.i386 i386
>Description:

Code in /usr/src/sys/boot/zfs/zfs.c (zfs_dev_init) probes all
possible disks/partitions to reconstruct ZFS pools.

Until rev 198420 it used to probe 4 partitions, which caused no
harm. However, with rev 198420 it started to probe all 128 possible
GPT partitions, causing very long boot times.

The problem is that for each possible GPT partition (diskNpM) the
code tries to fall back to a MBR slice (diskNsM).

When the code falls back to probing slices it actually ignores the
fact that the diskNpM partition could not be opened, and keeps
probing missing partitions up to index 128 (which takes a while).

If the partition index is > 4, the code tries to probe slices that
cannot possibly exist (diskNs5, ...), thus doubling the already
long processing time.

>How-To-Repeat:
>Fix:

Since we are looking for boot partitions, I think it should be safe
to assume that there are no empty partitions before them.

I moved slice probing into a separate loop to avoid any issues with
fallback code.

With the attached patch my 8.1-RC1 zfs-only server got back its old
boot time.

*** zfs.c.orig	Thu Jul  1 17:14:42 2010
--- zfs.c	Thu Jul  1 17:40:03 2010
***************
*** 413,427 ****
  		if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
  			close(fd);
  
  		for (slice = 1; slice <= 128; slice++) {
  			sprintf(devname, "disk%dp%d:", unit, slice);
  			fd = open(devname, O_RDONLY);
! 			if (fd == -1) {
! 				sprintf(devname, "disk%ds%d:", unit, slice);
! 				fd = open(devname, O_RDONLY);
! 				if (fd == -1)
! 					continue;
! 			}
  			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
  				close(fd);
  		}
--- 413,432 ----
  		if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
  			close(fd);
  
+ 		for (slice = 1; slice <= 4; slice++) {
+ 			sprintf(devname, "disk%ds%d:", unit, slice);
+ 			fd = open(devname, O_RDONLY);
+ 			if (fd == -1)
+ 				continue;
+ 			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
+ 				close(fd);
+ 		}
+ 
  		for (slice = 1; slice <= 128; slice++) {
  			sprintf(devname, "disk%dp%d:", unit, slice);
  			fd = open(devname, O_RDONLY);
! 			if (fd == -1)
! 				break;
  			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
  				close(fd);
  		}
>Release-Note:
>Audit-Trail:

From: Norikatsu Shigemura <nork@FreeBSD.org>
To: Jimmy Olgeni <olgeni@freebsd.org>
Cc: FreeBSD-gnats-submit@freebsd.org, rnoland@freebsd.org
Subject: Re: bin/148296: [zfs] [loader] Very slow probe in
 /usr/src/sys/boot/zfs/zfs.c
Date: Sat, 3 Jul 2010 01:40:19 +0900

 Hi olgeni.
 
 On Thu, 1 Jul 2010 16:09:36 GMT
 Jimmy Olgeni <olgeni@freebsd.org> wrote:
 > >Number:         148296
 > >Category:       bin
 > >Synopsis:       [zfs] [loader] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
 
 > Since we are looking for boot partitions, I think it should be safe
 > to assume that there are no empty partitions before them.
 > I moved slice probing into a separate loop to avoid any issues with
 > fallback code.
 > With the attached patch my 8.1-RC1 zfs-only server got back its old
 > boot time.
 
 	That's good news!  I confirmed your patch on my 9-current - GPT
 	only environment.  Now I'm testing on my MBR environment (now
 	make world-ing).
 
 -- 
 Norikatsu Shigemura <nork@FreeBSD.org>
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Jul 14 12:26:15 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 

From: Andriy Gapon <avg@freebsd.org>
To: bug-followup@freebsd.org, olgeni@freebsd.org
Cc:  
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
Date: Wed, 14 Jul 2010 16:07:56 +0300

 Another thing that most likely can be improved - if the whole disk is found to be
 a vdev of a ZFS pool, then it doesn't make sense to try to probe partitions/slices
 on the disk.  Or does it?
 
 But that's an extra.  The patch looks perfect.
 
 -- 
 Andriy Gapon

From: Jimmy Olgeni <olgeni@FreeBSD.org>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in
 /usr/src/sys/boot/zfs/zfs.c
Date: Wed, 14 Jul 2010 16:15:49 +0200 (CEST)

 Hello,
 
 This should do it, but I don't have the right configuration to test it 
 now. However, the "else continue" should bail out of the loop if a 
 vdev was spotted in the top level device.
 
 --- zfs.c.orig	2010-06-14 04:09:06.000000000 +0200
 +++ zfs.c	2010-07-14 16:04:49.808159404 +0200
 @@ -412,16 +412,23 @@
   		 */
   		if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
   			close(fd);
 +		else
 +			continue;
 +
 +		for (slice = 1; slice <= 4; slice++) {
 +			sprintf(devname, "disk%ds%d:", unit, slice);
 +			fd = open(devname, O_RDONLY);
 +			if (fd == -1)
 +				continue;
 +			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
 +				close(fd);
 +		}
 
   		for (slice = 1; slice <= 128; slice++) {
   			sprintf(devname, "disk%dp%d:", unit, slice);
   			fd = open(devname, O_RDONLY);
 -			if (fd == -1) {
 -				sprintf(devname, "disk%ds%d:", unit, slice);
 -				fd = open(devname, O_RDONLY);
 -				if (fd == -1)
 -					continue;
 -			}
 +			if (fd == -1)
 +				break;
   			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
   				close(fd);
   		}
 
 -- 
 jimmy

From: "Andrey V. Elsukov" <bu7cher@yandex.ru>
To: bug-followup@FreeBSD.org, olgeni@freebsd.org
Cc:  
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
Date: Wed, 14 Jul 2010 19:35:32 +0400

 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
 --------------enig1C378CA0ECE91BE4A07DCA95
 Content-Type: text/plain; charset=KOI8-R
 Content-Transfer-Encoding: quoted-printable
 
 Hi,
 
 just one note - user can create partition that doesn't start from 1.
 For example:
 	# mdconfig -s 100m
 	# gpart create -s gpt md0
 	# gpart add -i 5 -t freebsd-zfs md0
 
 --=20
 WBR, Andrey V. Elsukov
 
 
 --------------enig1C378CA0ECE91BE4A07DCA95
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (FreeBSD)
 
 iQEcBAEBAgAGBQJMPdlOAAoJEAHF6gQQyKF6+7YH/RGODEt2ighgrGuDMsVO13MZ
 rBrHB94RQb41yPijQeIC5tp41q/KyqesPBHZLkIMjTk9HmzdhobH6RPHpdn9iV9I
 SEmsJKWTDM3NMGCnE4HwsrrUIoeNI98AqkoqVVWslB48vYH4GPe7tkR0t5WqHKkk
 rGULOhYvM/ctT8Bqrz5GUSjLuPJDJy1NUBDnzszHF223TLjQUNGFnHL9mxfuQji3
 yaTs8pLp/lgaOYZe/7jxzuxe2/w2tntPCGksqI+h+/8LVRTf6/YVdh99Q/Pen19t
 WrNBRPZBT3KO2YQzGFoQtxSTe76bzmuBxNB20MsLYtu5jqvmpwxQBuDJ2tuGAB8=
 =Ida+
 -----END PGP SIGNATURE-----
 
 --------------enig1C378CA0ECE91BE4A07DCA95--

From: Andriy Gapon <avg@icyb.net.ua>
To: bug-followup@FreeBSD.org, olgeni@FreeBSD.org
Cc:  
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
Date: Thu, 15 Jul 2010 11:46:31 +0300

 The last point is a good one.
 Perhaps the probe code should delve into the lower-level code and directly
 examine GPT in memory...
 
 -- 
 Andriy Gapon

From: Jimmy Olgeni <olgeni@FreeBSD.org>
To: "Andrey V. Elsukov" <bu7cher@yandex.ru>
Cc: bug-followup@FreeBSD.org, Andriy Gapon <avg@icyb.net.ua>,
        Norikatsu Shigemura <nork@FreeBSD.org>
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in
 /usr/src/sys/boot/zfs/zfs.c
Date: Mon, 26 Jul 2010 13:56:03 +0200 (CEST)

 Hello,
 
 On Wed, 14 Jul 2010, Andrey V. Elsukov wrote:
 
 > just one note - user can create partition that doesn't start from 1.
 > For example:
 > 	# mdconfig -s 100m
 > 	# gpart create -s gpt md0
 > 	# gpart add -i 5 -t freebsd-zfs md0
 
 This is slower, but safer: it doesn't fall back to slices, and only 
 skips checks when a vdev is found at the unit level.
 
 --- zfs.c.orig	2010-06-14 04:09:06.000000000 +0200
 +++ zfs.c	2010-07-26 13:37:23.490536162 +0200
 @@ -408,20 +408,27 @@
 
   		/*
   		 * If we find a vdev, the zfs code will eat the fd, otherwise
 -		 * we close it.
 +		 * we close it and check for vdevs in slices and partitions.
   		 */
   		if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
   			close(fd);
 +		else
 +			continue;
 +
 +		for (slice = 1; slice <= 4; slice++) {
 +			sprintf(devname, "disk%ds%d:", unit, slice);
 +			fd = open(devname, O_RDONLY);
 +			if (fd == -1)
 +				continue;
 +			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
 +				close(fd);
 +		}
 
   		for (slice = 1; slice <= 128; slice++) {
   			sprintf(devname, "disk%dp%d:", unit, slice);
   			fd = open(devname, O_RDONLY);
 -			if (fd == -1) {
 -				sprintf(devname, "disk%ds%d:", unit, slice);
 -				fd = open(devname, O_RDONLY);
 -				if (fd == -1)
 -					continue;
 -			}
 +			if (fd == -1)
 +				continue;
   			if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0))
   				close(fd);
   		}

From: Andriy Gapon <avg@icyb.net.ua>
To: Jimmy Olgeni <olgeni@FreeBSD.org>
Cc: "Andrey V. Elsukov" <bu7cher@yandex.ru>, bug-followup@FreeBSD.org,
        Norikatsu Shigemura <nork@FreeBSD.org>
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
Date: Mon, 26 Jul 2010 15:23:00 +0300

 on 26/07/2010 14:56 Jimmy Olgeni said the following:
 > 
 > Hello,
 > 
 > On Wed, 14 Jul 2010, Andrey V. Elsukov wrote:
 > 
 >> just one note - user can create partition that doesn't start from 1.
 >> For example:
 >>     # mdconfig -s 100m
 >>     # gpart create -s gpt md0
 >>     # gpart add -i 5 -t freebsd-zfs md0
 > 
 > This is slower, but safer: it doesn't fall back to slices, and only
 > skips checks when a vdev is found at the unit level.
 
 I still think that we need a method for querying partition scheme and available
 partitions for a disk.  Perhaps for "biosdisk" only at this point.
 Just look into sys/boot/i386/libi386/biosdisk.c, bd_open/bd_open_gpt/bd_open_mbr
 to see what a waste each blind probe is.
 
 -- 
 Andriy Gapon

From: Jimmy Olgeni <olgeni@FreeBSD.org>
To: Andriy Gapon <avg@icyb.net.ua>
Cc: "Andrey V. Elsukov" <bu7cher@yandex.ru>, bug-followup@FreeBSD.org,
        Norikatsu Shigemura <nork@FreeBSD.org>
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in
 /usr/src/sys/boot/zfs/zfs.c
Date: Tue, 27 Jul 2010 14:06:42 +0200 (CEST)

 On Mon, 26 Jul 2010, Andriy Gapon wrote:
 
 > I still think that we need a method for querying partition scheme and available
 > partitions for a disk.  Perhaps for "biosdisk" only at this point.
 
 I poked around a bit...
 
 If we #include "../i386/libi386/libi386.h" (which seems a bit ugly 
 here) then we can examine the partition layout using 
 _data._gpt.gpt_nparts and _data._gpt.gpt_partitions from struct 
 open_disk:
 
    struct i386_devdesc *desc = ((struct i386_devdesc *)(files[fd].f_devdata));
 
    struct open_disk *od = desc->d_kind.biosdisk.data;
 
    /* get actual partition count and types */
 
 However, struct open_disk and gpt_part are local to biosdisk.c, so 
 they should probably be moved to a header.
 
 Maybe struct open_disk could be moved to stand.h where struct 
 open_file already resides?
 
 In that case we would only need to pull i386_devdesc from libi386.h 
 here.
 
 -- 
 jimmy

From: Andriy Gapon <avg@icyb.net.ua>
To: Jimmy Olgeni <olgeni@FreeBSD.org>
Cc: "Andrey V. Elsukov" <bu7cher@yandex.ru>, bug-followup@FreeBSD.org,
        Norikatsu Shigemura <nork@FreeBSD.org>
Subject: Re: bin/148296: [zfs] [loader] [patch] Very slow probe in /usr/src/sys/boot/zfs/zfs.c
Date: Tue, 27 Jul 2010 15:24:21 +0300

 on 27/07/2010 15:06 Jimmy Olgeni said the following:
 > 
 > On Mon, 26 Jul 2010, Andriy Gapon wrote:
 > 
 >> I still think that we need a method for querying partition scheme and
 >> available
 >> partitions for a disk.  Perhaps for "biosdisk" only at this point.
 > 
 > I poked around a bit...
 > 
 > If we #include "../i386/libi386/libi386.h" (which seems a bit ugly here)
 > then we can examine the partition layout using _data._gpt.gpt_nparts and
 > _data._gpt.gpt_partitions from struct open_disk:
 > 
 >   struct i386_devdesc *desc = ((struct i386_devdesc
 > *)(files[fd].f_devdata));
 > 
 >   struct open_disk *od = desc->d_kind.biosdisk.data;
 > 
 >   /* get actual partition count and types */
 > 
 > However, struct open_disk and gpt_part are local to biosdisk.c, so they
 > should probably be moved to a header.
 > 
 > Maybe struct open_disk could be moved to stand.h where struct open_file
 > already resides?
 > 
 > In that case we would only need to pull i386_devdesc from libi386.h here.
 
 Well, we could have some accessor functions that would provide the information
 rather than directly poking the internal disk structures...
 
 -- 
 Andriy Gapon
State-Changed-From-To: open->closed 
State-Changed-By: olgeni 
State-Changed-When: Thu Sep 15 20:43:30 UTC 2011 
State-Changed-Why:  
The slow probing issue doesn't seem to exist on 8-STABLE anymore. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 
>Unformatted:
