From nobody@FreeBSD.org  Fri Apr 11 19:09:16 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4EC00106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 11 Apr 2008 19:09:16 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 3347A8FC19
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 11 Apr 2008 19:09:16 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m3BJ8r2E079929
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 11 Apr 2008 19:08:53 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m3BJ8rEj079928;
	Fri, 11 Apr 2008 19:08:53 GMT
	(envelope-from nobody)
Message-Id: <200804111908.m3BJ8rEj079928@www.freebsd.org>
Date: Fri, 11 Apr 2008 19:08:53 GMT
From: Mike Hibler <mike@flux.utah.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: FreeBSD boot loader doesn't work on Dell R900 (+workaround)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         122668
>Category:       i386
>Synopsis:       [boot] FreeBSD boot loader doesn't work on Dell R900 (+workaround)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    jhb
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 11 19:10:01 UTC 2008
>Closed-Date:    Fri Jun 27 01:00:49 UTC 2008
>Last-Modified:  Fri Jun 27 01:00:49 UTC 2008
>Originator:     Mike Hibler
>Release:        6.2-RELEASE
>Organization:
University of Utah, Flux Research Group
>Environment:
N/A
>Description:
As far as I can tell, this isn't a bug in the BSD bootloader, rather it is
a bug in the Dell BIOS.  However, googling around I see that other people have
seen this problem and I have worked around it, so thought I would report it.

Note also that I am seeing this bug in the Emulab bootloader which is derived
from the FreeBSD 6.2-RELEASE version of the bootloader, but I believe that
the problem would be the same in the actual boot loader (based on the posts
I have seen). 

The symptom is that I try to boot over the net using a PXE (currdev="pxe0:")
and the loader complains that it "cannot load kernel".

The problem is that on this machine one of BIOS calls (int15/fn0x820) in
bios_getsmap (src/sys/boot/i386/libi386/biossmap.c) is returning more than
the 20 bytes of data it is supposed to--it appears to return the value 0x09
in the 21st byte (or 24th, I forget my little-endian lore).  As the data are
being read into a 20-byte static heap buffer, the result is that the following
variable gets clobbered.  In this case 'smap' is the buffer, and the following
BSS allocated region is 'smapbase':

static struct bios_smap smap;
static struct bios_smap *smapbase;

smapbase is the dynamically allocated area where the individual smap
entries are copied into via:

                bcopy(&smap, &smapbase[smaplen], sizeof(struct bios_smap));

What I see then is that the first couple of iterations of read-an-entry,
copy-to-buffer work fine, but then one call returns the extra data and
the low-order byte of smapbase gets changed to 0x09 from something like 0xb4.
The result is still a legit address so the bcopy goes without incident but
the smap entry data winds up getting bcopy()ed to an earlier address,
overwriting other malloc()ed memory.

In this case it is overwriting some entries in the 'environ' environment
linked list, corrupting the chain.  The result is that I no longer have
a "currdev" environment variable, and so the loader tries to load from
the default (hard drive) rather than the net.  Since there is nothing on
the hard drive, it cannot read loader.rc or boot.conf or ..., and ultimately
winds up trying to load "kernel" which fails with an error.

Note that there are two read-data loops in this function, and the problem
does occur in the first loop as well, but since smapbase has not yet been
initialized (i.e., no bcopy happens here) it does not matter.

Note also that one post I read mentioned that another BSD boots fine on
the machine.  That could be because in that BSD they are reading the data
directly into the smapbase buffer and not via a temporary smap buffer.
There what is getting clobbered with 0x09 is just a yet-to-be-filled,
later part of the smapbase buffer.

>How-To-Repeat:
Try booting from Dell R900
>Fix:
The work around is to (arbitrarily) pad the temporary smap buffer with
another 4 bytes.  I tried padding up to an extra 32 bytes, but never saw
more than the single overwrite, and that was always in the 0-3rd byte after.


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->patched 
State-Changed-By: jhb 
State-Changed-When: Sat Jun 7 03:07:50 UTC 2008 
State-Changed-Why:  
Thanks for the exellent analysis.  I was testing a R900 today and this 
PR saved a lot of time for me when another developer pointed me at it. 
I've committed a workaround to HEAD and will MFC it in a few days. 


Responsible-Changed-From-To: freebsd-i386->jhb 
Responsible-Changed-By: jhb 
Responsible-Changed-When: Sat Jun 7 03:07:50 UTC 2008 
Responsible-Changed-Why:  
Thanks for the exellent analysis.  I was testing a R900 today and this 
PR saved a lot of time for me when another developer pointed me at it. 
I've committed a workaround to HEAD and will MFC it in a few days. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=122668 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: i386/122668: commit references a PR
Date: Sat,  7 Jun 2008 03:07:50 +0000 (UTC)

 jhb         2008-06-07 03:07:32 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/boot/i386/libi386 biossmap.c 
   Log:
   SVN rev 179631 on 2008-06-07 03:07:32Z by jhb
   
   Workaround a bug in the BIOS of Dell R900 machines.  Specifically, each
   entry in the SMAP is a 20 byte structure and they are queried from the
   BIOS via sucessive BIOS calls.  Due to an apparent bug in the R900's
   BIOS, for some SMAP requests the BIOS overflows the 20 byte buffer
   trashing a few bytes of memory immediately after the SMAP structure.  As
   a workaround, add 8 bytes of padding after the SMAP structure used in
   the loader for SMAP queries.
   
   PR:             i386/122668
   Submitted by:   Mike Hibler  mike flux.utah.edu, silby
   MFC after:      3 days
   
   Revision  Changes    Path
   1.8       +4 -1      src/sys/boot/i386/libi386/biossmap.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: jhb 
State-Changed-When: Fri Jun 27 01:00:35 UTC 2008 
State-Changed-Why:  
Fix merged to RELENG_[67]. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=122668 
>Unformatted:
