From nobody@FreeBSD.org  Sat Jul 15 19:54:06 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4B76016A4DA
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Jul 2006 19:54:06 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E974C43D45
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Jul 2006 19:54:05 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k6FJs5ad028259
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Jul 2006 19:54:05 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k6FJs5Gh028258;
	Sat, 15 Jul 2006 19:54:05 GMT
	(envelope-from nobody)
Message-Id: <200607151954.k6FJs5Gh028258@www.freebsd.org>
Date: Sat, 15 Jul 2006 19:54:05 GMT
From: Guillaume Ballet <asqyzeron@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Fix: "Non-maskable interrupt while in kernel mode with" a TI firewire controler
X-Send-Pr-Version: www-2.3

>Number:         100356
>Category:       kern
>Synopsis:       [firewire] [patch] Non-maskable interrupt while in kernel mode with a TI firewire controller
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jul 15 20:00:33 GMT 2006
>Closed-Date:    Sun Jul 08 06:41:30 GMT 2007
>Last-Modified:  Sun Jul 08 06:41:30 GMT 2007
>Originator:     Guillaume Ballet
>Release:        All of them since 5.2 at least
>Organization:
>Environment:
GENERIC - and any kernel including the firewire driver
>Description:
At boot time, when initializing the firewire driver with a machine having a
TI controler (0x104c, 0x8032 at least), the following error message appears.

RAM parity error, likely hardware failure.
Fatal trap 19: non-maskable interrupt trap while in kernel mode.
instruction pointer = 0x20:0xc0528586
stack pointer       = 0x28:0xc10209c4
code segment        = base 0x0, limit 0xfffff, type 0x1b
                   = DPL0, pres 1, def32 1, gran 1
processor eflags    = interupt enabled, IOPL = 0
current process     = 0 (swapper)
trap number         = 19
panic : non-maskable interrupt trap

This is due to the fact that the controler and/or the PCI bus doesn't react
quickly enough to the first OWRITE function (see code below, from
sys/dev/firewire/fwohci.c).

312         OWRITE(sc, FWOHCI_INTSTATCLR, OHCI_INT_REG_FAIL);
313         fun = PHYDEV_RDCMD | (addr << PHYDEV_REGADDR);
314         OWRITE(sc, OHCI_PHYACCESS, fun);
315         for ( i = 0 ; i < MAX_RETRY ; i ++ ){
316                 fun = OREAD(sc, OHCI_PHYACCESS);
317                 if ((fun & PHYDEV_RDCMD) == 0 && (fun & PHYDEV_RDDONE) != 0)
318                         break;
319                 DELAY(100);
320         }

When performing the second OWRITE, an uninitialized value makes its way to
eax, and at the instruction:

<fwphy_rddata+156>:  mov    0xec(%eax),%eax

it fails. The debugger told me eax = 0xffffffff.

A read error on the PCI bus is wrongly interpreted as an ISA NMI error, and
the kernel crashes.

The problem seems to only happens when trying to read the speed, thus
pointing at slow update on the PCI bus at init time.
>How-To-Repeat:
Insert any freebsd install CD into the drive at boot time. Wait. Enjoy :P
>Fix:
Fixing the problem is fairly simple : Just give more time to the bus or the
controler to configure the right port. This is done by altering the code,
as below

OWRITE(sc, FWOHCI_INTSTATCLR, OHCI_INT_REG_FAIL);
if (addr == FW_PHY_SPD_REG)
	DELAY(500);
fun = PHYDEV_RDCMD | (addr << PHYDEV_REGADDR);
OWRITE(sc, OHCI_PHYACCESS, fun);
for ( i = 0 ; i < MAX_RETRY ; i ++ ){
        fun = OREAD(sc, OHCI_PHYACCESS);
            if ((fun & PHYDEV_RDCMD) == 0 && (fun & PHYDEV_RDDONE) != 0)
                break;
}

which, given as a diff on sys/dev/firewire/fwohci.c is:
313a314,315
>	if (addr == FW_PHY_SPD_REG)
>		DELAY(500);

It has been tested on several machines and works fine. Of course, it could
be more elegant to write specific code for when trying to determine the
speed instead of adding a if in a function that doesn't use it most of the
time. I prefer to let the maintainer of this file decide what is best.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-i386->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Jul 17 04:19:17 UTC 2006 
Responsible-Changed-Why:  
This does not sound i386-specific. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100356 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/100356: commit references a PR
Date: Mon, 19 Mar 2007 03:35:52 +0000 (UTC)

 simokawa    2007-03-19 03:35:46 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/dev/firewire     fwohci.c 
   Log:
   Wait SCLK to be stable after LPS enabled.
   This should fix NMI problem in fwphy_rddata().
   
   PR: kern/94146 kern/100356
   MFC: after 3 days
   
   Revision  Changes    Path
   1.86      +2 -0      src/sys/dev/firewire/fwohci.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->feedback 
State-Changed-By: gavin 
State-Changed-When: Tue Jun 12 16:12:51 UTC 2007 
State-Changed-Why:  

To submitter: A fix has been committed for this to RELENG_6.  Are you 
able to retest to confirm that this fixes it for you? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100356 

From: Gavin Atkinson <gavin@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/100356: [firewire] [patch] Non-maskable interrupt while
	in kernel mode with a TI firewire controller
Date: Tue, 03 Jul 2007 14:51:09 +0100

 Feedback received:
 
 -------- Forwarded Message --------
 From: Le Duc d'Asq-Yzeron <asqyzeron@gmail.com>
 Date: Sun, 1 Jul 2007 19:52:54 +0000
 
 It did work with 6.2-STABLE from June 2007.
 
 Regards,
 Guillaume Ballet
State-Changed-From-To: feedback->closed 
State-Changed-By: linimon 
State-Changed-When: Sun Jul 8 06:41:17 UTC 2007 
State-Changed-Why:  
Submitter notes that this has been fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100356 
>Unformatted:
