From mmcg@mjolnir.cs.monash.edu.au  Wed Jun 25 01:53:23 1997
Received: from mjolnir.cs.monash.edu.au (heraclitus.cs.monash.edu.au [130.194.64.241])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id BAA11378
          for <FreeBSD-gnats-submit@freebsd.org>; Wed, 25 Jun 1997 01:53:16 -0700 (PDT)
Received: (from mmcg@localhost)
	by mjolnir.cs.monash.edu.au (8.8.5/8.8.5) id SAA00488;
	Wed, 25 Jun 1997 18:00:03 +1000 (EST)
Message-Id: <199706250800.SAA00488@mjolnir.cs.monash.edu.au>
Date: Wed, 25 Jun 1997 18:00:03 +1000 (EST)
From: Mike McGaughey <mmcg@heraclitus.cs.monash.edu.au>
Reply-To: mmcg@heraclitus.cs.monash.edu.au
To: FreeBSD-gnats-submit@freebsd.org
Cc: mmcg@heraclitus.cs.monash.edu.au
Subject: Erroneous wdc probe failure and possible fix
X-Send-Pr-Version: 3.2

>Number:         3949
>Category:       kern
>Synopsis:       The WD controller probe can fail when it shouldn't (and a plausible fix)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    sos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 25 02:00:01 PDT 1997
>Closed-Date:    Fri Mar 19 02:23:45 PST 1999
>Last-Modified:  Fri Mar 19 02:24:09 PST 1999
>Originator:     Mike McGaughey
>Release:        FreeBSD 2.2.2-RELEASE i386
>Organization:
Monash University
>Environment:

3 IDE disk drives, including one as master on wdc1.

Diamond Data 16x CDrom, as slave on wdc1.  Interesting `feature'
of this drive - it seems to return an error condition (0x04 - I
have no idea what this means) until it has been properly initialised
by the ATAPI code.  The same thing happens under DOS, as evidenced
by the fact that the disk light behaves the same under both operating
systems - it's on permanently until the CD driver is loaded (not a lot
of evidence, but different to my other ATAPI drives).

Here's the relevant parts of dmesg for my (now working) system:

FreeBSD 2.2.2-RELEASE #5: Wed Jun 25 17:05:39 EST 1997
    mmcg@mjolnir.cs.monash.edu.au:/usr/src/sys/compile/MJOLNIR
CPU: Pentium (119.75-MHz 586-class CPU)
[...]
chip2 <Intel 82371FB IDE interface> rev 2 on pci0:7:1
[...]
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <QUANTUM FIREBALL_TM2550A>, 32-bit, multi-block-16
wd0: 2445MB (5008752 sectors), 4969 cyls, 16 heads, 63 S/T, 512 B/S
wdc0: unit 1 (wd1): <NEC Corporation DSE1700A>, 32-bit, multi-block-16
wd1: 1627MB (3332448 sectors), 3306 cyls, 16 heads, 63 S/T, 512 B/S

[These next two messages were obtained by uncommenting the two
printf's in sys/i386/isa/wd.c:wdprobe() - MMCG]
WDC1 - Error : 81
WDC1 - Error (drv 1) : 4

wdc1 at 0x170-0x177 irq 15 on isa
wdc1: unit 0 (wd2): <QUANTUM SIROCCO2550A>
wd2: 2445MB (5008752 sectors), 4969 cyls, 16 heads, 63 S/T, 512 B/S
wdc1: unit 1 (atapi): </P61A>, removable, dma, iordy
wcd0: 1757Kb/sec, 120Kb cache, audio play, 255 volume levels, ejectable tray
wcd0: no disc inside, unlocked

>Description:

src/sys/i386/isa/wd.c:wdprobe() attempts to determine whether a
controller is present.  According to the comments in the source,
there are some controllers which return 0x81 (indicating drive 0
OK, drive 1 bad), but which will return `good' status for the
second drive if it is probed directly.  The code in wdprobe() attempts
to get around this by directly probing the second drive if 0x81
is returned.  Then, if the second drive returns an error status,
it merrily assumes there is no device, and (for some bizarre reason)
no controller.  Thus, those of us running two devices off wdc1 cannot
use either.

Now, if you got an 0x81 status return in the first place,
there probably *is* a controller present (and it was simply the
probe for disk 1 that failed) - if there were no controller,
surely the controller reset (earlier in the function) would have
failed instead?

>How-To-Repeat:

Attach an IDE CDrom that is known not to work with FreeBSD as a
slave drive on either controller.
In sys/i386/isa/wd.c:wdprobe(), uncomment the two `error' print
statements; compile and install a new kernel.  Reboot.

If you're lucky, when probing the controller with the ATAPI drive,
you'll see something like:

Error : 81
Error (drv 1) : 4

where the (drv 1) error is anything other than 0x81 or 0x01, and
the *controller* probe will fail.

>Fix:

I'm not altogether clear why the direct probe for drive 1 on the
controller is in there in the first place - as far as I can see,
we are not looking for a drive, but rather, for a controller (do
things fail if we have a controller with no attached drives?).

In any case, the quick fix for me was to put #if 0/#endif around the
test described above (and a comment).  I've included enough trailing
context here to patch it by hand; this is the last 30 lines of the
modified wdprobe():

			/*
			 * If drive 1 fails, why do we simply go to nodevice here?  Drive
			 * 0 may have been OK, because of the return status of 0x8x (and
			 * it not being due to an ATAPI slave), but the ATAPI itself
			 * could have failed for any number of reasons.  My Intel 82371FB
			 * reports 0x81 before my ATAPI drive has been correctly
			 * initialised (the ATAPI drive isn't initialised until the
			 * ATAPI code probes it!).  And, as
			 * far as I'm concerned, getting a valid status return at
			 * all (0x81) implies we had a controller... - MMCG
			 */
#if 0
			if(du->dk_error != 0x01 && du->dk_error != 0x81)
				goto nodevice;
#endif
		} else	/* drive 0 fail */
			goto nodevice;
	}


	free(du, M_TEMP);
	return (IO_WDCSIZE);

nodevice:
	free(du, M_TEMP);
	return (0);
}
>Release-Note:
>Audit-Trail:

From: Brian Scott <Bscott@vitgssw.telstra.com.au>
To: freebsd-gnats-submit@freebsd.org, mmcg@heraclitus.cs.monash.edu.au
Cc:  Subject: Re: kern/3949: The WD controller probe can fail when it shouldn't (and a plausible fix)
Date: Thu, 04 Sep 97 10:00:15 1000

 Just a quick note to let you know that I have had exactly the same 
 problem and derived the same (working) fix.  Only difference is that on 
 my system everything is on wdc0.  Looks like its the same model of CD-ROM 
 drive.  Actually, odds are that it was bought in the same shop in 
 Clayton (TECS).  Maybe they should put a note on the door.....
 
 regards,
 
 Brian
 
 
Responsible-Changed-From-To: freebsd-bugs->sos 
Responsible-Changed-By: phk 
Responsible-Changed-When: Thu Sep 18 23:21:07 PDT 1997 
Responsible-Changed-Why:  
Soren, den kigger du lige paa ? 
State-Changed-From-To: open->closed 
State-Changed-By: sheldonh 
State-Changed-When: Fri Mar 19 02:23:45 PST 1999 
State-Changed-Why:  
Active development on 2.2 branch is over. 
>Unformatted:
