From nobody@FreeBSD.org  Wed Mar 17 16:47:41 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AEFBA106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 17 Mar 2010 16:47:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 9D6168FC12
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 17 Mar 2010 16:47:41 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o2HGlfnK006978
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 17 Mar 2010 16:47:41 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o2HGlf6Q006977;
	Wed, 17 Mar 2010 16:47:41 GMT
	(envelope-from nobody)
Message-Id: <201003171647.o2HGlf6Q006977@www.freebsd.org>
Date: Wed, 17 Mar 2010 16:47:41 GMT
From: Gilles Blanc <gblanc@linagora.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: boot problem on USB (root partition mounting)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         144824
>Category:       kern
>Synopsis:       [boot] [patch] boot problem on USB (root partition mounting)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 17 16:50:01 UTC 2010
>Closed-Date:    
>Last-Modified:  Wed Aug 04 01:19:43 UTC 2010
>Originator:     Gilles Blanc
>Release:        8.0-RELEASE (current)
>Organization:
Linagora
>Environment:
FreeBSD freedaemon.par.lng 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
The current system on boot (file /sys/kern/vfs_mount.c) uses a queue to wait for devices to be initialized before mounting root (or try to do so). This queue is filled for instance by usb driver (using "root_mount_hold" function), so if we boot on a USB key, the function "root_mount_prepare" holds the root mount time until USB is available (that is to say the queue has be emptied by using "root_mount_rel" on all the identifiers filled by the usb driver).

Actually, it only waits for USB to be "physically" available, but not necessarily umass or scsi (scsi-da). To be more precise, the system is not deterministic, because to be mounted, a root partition on a USB key needs both umass then scsi to be initialized, and if most of the time the mount process works, it is because the 'root_holds' list is not empty, and threads are running concurrently (for example one have wired a usb key on usb0, the system sequentially initializes usb0 to usb7, and during that time, umass0 and da0 are initialized too).

Unfortunately, some servers are not that kind, and root mounting just fails ('vfs_mountroot' function asks to 'vfs_mountroot_try' to mount USB root partition, which is not yet available), so we are in a situation where the "ROOT MOUNT ERROR" prompt appears, to mount our partition by hand, which is not very acceptable on production servers (we would have to go some kilometers just to type "ufs:/dev/da0s1a" each time we reboot...).

The problem is not blocking for most of FreeBSD users, but it prevents us to migrate our systems (which is quite a big problem).
>How-To-Repeat:
If you have a machine presenting this problem, you can repeat it easily (it fails 95% of the time) ; if not (like in my development laptop), you will never succeed to fail.
>Fix:
I have tried to add locks in umass and scsi drivers. In umass driver, it is in the /sys/dev/usb/storage/umass.c file, in function 'umass_attach' (in our supermicro server, umass has enough time to initialize, but I have been rigorous). In scsi driver, it is in the /sys/cam/scsi/scsi_da.c file, in function 'dastart', part "DA_STATE_PROBE2" of the switch/case. Unfortunately, between this two pairs of locking/unlocking, the root mounting thread preempts and as the list is empty during this very short time, it tries to mount root partition and fails as usual. It is not possible to add a lock in umass and remove it in scsi, because of the API which works with pointers on the lock list at the removal.

So another solution has to be considered, that is what I propose with this patch. Simply, in the vfs_mountroot_try, I try several times, with a little pause between, to call the 'kernel_mount' function. The number of trials is 3 by default, but can be customized through the new "vfs.root.mounttrymax" option in /boot/loader.conf (even set to 0, if we want to go back to the initial behavior). Each time the mount process fails and we can retry, a message appears, the thread sleeps for one second, and then try again. If it is really impossible to mount root, then we continue in the normal process of prompt.

Actually, there is still some problems on some USB ports (the other ones on the same machine work great at the first or second mounting retrial). I suspect a deeper problem in 'kernel_mount', because using the prompt doesn't mount the device, or worse can lead to page fault or locking. But my patch is enough to resolve the original problem as far as it is possible in the state of things.

I hope it will be reviewed and accepted as soon as possible.

Patch attached with submission follows:

--- vfs_mount.c	2010-03-17 15:30:45.000000000 +0100
+++ vfs_mount.c	2010-03-17 14:49:52.000000000 +0100
@@ -1798,6 +1806,8 @@
 	int		error;
 	char		patt[32];
 	char		errmsg[255];
+	char		nbtry;
+	int		rootmounttrymax;
 
 	vfsname = NULL;
 	path    = NULL;
@@ -1805,6 +1815,8 @@
 	ma	= NULL;
 	error   = EINVAL;
 	bzero(errmsg, sizeof(errmsg));
+	nbtry	= 0;
+	rootmounttrymax = 3;
 
 	if (mountfrom == NULL)
 		return (error);		/* don't complain */
@@ -1827,7 +1839,18 @@
 	ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
 	ma = mount_arg(ma, "ro", NULL, 0);
 	ma = parse_mountroot_options(ma, options);
-	error = kernel_mount(ma, MNT_ROOTFS);
+
+	TUNABLE_INT_FETCH("vfs.root.mounttrymax", &rootmounttrymax);
+	while (1) {
+		error = kernel_mount(ma, MNT_ROOTFS);
+		if (nbtry < rootmounttrymax && error != 0) {
+			printf("Mount failed, retrying mount root from %s\n", mountfrom);
+			tsleep(&rootmounttrymax, PZERO | PDROP, "mount", hz);
+			nbtry++;
+		}
+		else
+			break;
+	}
 
 	if (error == 0) {
 		/*


>Release-Note:
>Audit-Trail:

From: Arseny Nasokin <eirnym@gmail.com>
To: "bug-followup@FreeBSD.org" <bug-followup@FreeBSD.org>,
 "gblanc@linagora.com" <gblanc@linagora.com>
Cc:  
Subject: Re: kern/144824: [boot] [patch] boot problem on USB (root partition mounting)
Date: Wed, 31 Mar 2010 12:29:54 +0400

 --Apple-Mail-1--340781557
 Content-Type: text/plain;
 	charset=us-ascii;
 	format=flowed;
 	delsp=yes
 Content-Transfer-Encoding: 7bit
 
 I has same issue usb/145184. I've tryed your patch, but it doesn't  
 work :(
 
 --
   With pleasure
 --Apple-Mail-1--340781557--

From: Daniel Hartmeier <daniel@benzedrine.cx>
To: bug-followup@FreeBSD.org
Cc: gblanc@linagora.com
Subject: Re: kern/144824: [boot] [patch] boot problem on USB (root partition mounting)
Date: Fri, 16 Jul 2010 12:56:05 +0200

 You have to move the ma initialization inside the retry loop,
 because kernel_mount() frees it, otherwise I get a kernel panic.
 
 With that changed, the patch solves the issue with an Intel
 S5000PAL board booting from USB, where da0 attaches slightly
 too late. Possibly related to the RMM2 (remote management
 module), which attaches multiple (virtual) CD-ROM drives to
 USB, which produce CAM/SCSI status errors.
 
 Daniel
 
 --- vfs_mount.c 30 Jan 2010 12:11:21 -0000      1.312.2.3
 +++ vfs_mount.c 16 Jul 2010 10:38:46 -0000
 @@ -1798,6 +1798,8 @@
         int             error;
         char            patt[32];
         char            errmsg[255];
 +       char            nbtry;
 +       int             rootmounttrymax;
 
         vfsname = NULL;
         path    = NULL;
 @@ -1805,6 +1807,8 @@
         ma      = NULL;
         error   = EINVAL;
         bzero(errmsg, sizeof(errmsg));
 +       nbtry   = 0;
 +       rootmounttrymax = 3;
 
         if (mountfrom == NULL)
                 return (error);         /* don't complain */
 @@ -1821,13 +1825,23 @@
         if (path[0] == '\0')
                 strcpy(path, ROOTNAME);
 
 -       ma = mount_arg(ma, "fstype", vfsname, -1);
 -       ma = mount_arg(ma, "fspath", "/", -1);
 -       ma = mount_arg(ma, "from", path, -1);
 -       ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
 -       ma = mount_arg(ma, "ro", NULL, 0);
 -       ma = parse_mountroot_options(ma, options);
 -       error = kernel_mount(ma, MNT_ROOTFS);
 +       while (1) {
 +               ma = NULL;
 +               ma = mount_arg(ma, "fstype", vfsname, -1);
 +               ma = mount_arg(ma, "fspath", "/", -1);
 +               ma = mount_arg(ma, "from", path, -1);
 +               ma = mount_arg(ma, "errmsg", errmsg, sizeof(errmsg));
 +               ma = mount_arg(ma, "ro", NULL, 0);
 +               ma = parse_mountroot_options(ma, options);
 +               error = kernel_mount(ma, MNT_ROOTFS);
 +               if (nbtry < rootmounttrymax && error != 0) {
 +                       printf("Mount failed, retrying mount root from %s\n",
 +                           mountfrom);
 +                       tsleep(&rootmounttrymax, PZERO | PDROP, "mount", hz);
 +                       nbtry++;
 +               } else
 +                       break;
 +       }
 
         if (error == 0) {
                 /*
 
Date: Tue, 03 Aug 2010 11:56:16 +0400
From: Grigory Rechistov <ggg_mail@inbox.ru>
Reply-To: Grigory Rechistov <ggg_mail@inbox.ru>
To: bug-followup@FreeBSD.org,
	gblanc@linagora.com
Subject: Re: kern/144824: [boot] [patch] boot problem on USB (root partition mounting)

 Experienced this issue on FreeBSD 8.1-RELEASE i386, see this bug
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=usb/143790
 
 for additional details.
>Unformatted:
