From nobody@FreeBSD.org  Fri Nov 16 22:13:41 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 45FA816A419
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Nov 2007 22:13:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 3221813C4CC
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Nov 2007 22:13:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.1/8.14.1) with ESMTP id lAGMD5FA057296
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Nov 2007 22:13:05 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.1/8.14.1/Submit) id lAGMD5bP057295;
	Fri, 16 Nov 2007 22:13:05 GMT
	(envelope-from nobody)
Message-Id: <200711162213.lAGMD5bP057295@www.freebsd.org>
Date: Fri, 16 Nov 2007 22:13:05 GMT
From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: firewire bus reset hogs CPU, causing data to be lost
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         118093
>Category:       kern
>Synopsis:       [firewire] firewire bus reset hogs CPU, causing data to be lost
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-firewire
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Nov 16 22:20:01 UTC 2007
>Closed-Date:    
>Last-Modified:  Thu Feb  3 22:00:20 UTC 2011
>Originator:     Dieter
>Release:        6.2
>Organization:
>Environment:
6.2-RELEASE amd64
>Description:
When a firewire device is powered on or off, the firewire bus is reset.
A firewire bus reset hogs the CPU for too long, more than 0.1 second, causing
data to be lost.  Disk and Ethernet activity (and probably all i/o) stop.
Stopping Ethernet for too long causes data to be lost.  Userland processes
also don't run, but they have to expect that, it is the devices not getting
serviced that I'm concerned about.

>How-To-Repeat:
Have some i/o going.
fwcontrol -r
observe that all i/o stops for 1/10 second
>Fix:


>Release-Note:
>Audit-Trail:

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, freebsd-drivers@freebsd.org
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Sun, 14 Dec 2008 19:30:04 +0000

 I found the source of this problem.  When a firewire bus resets,
 the firewire driver prints a few lines to the console,
 using printf(9) and device_printf(9).  I suspect that these are
 running at splfw aka splimp, locking out other i/o.
 
 Commenting out the *printf() calls fixes the problem, but that
 isn't a good solution.
 
 Would changing the *printf() calls to log(9) calls be safe?
 ("safe" meaning other i/o doesn't get locked out)
 
 Ah, for the good old days when 19200 baud seemed fast...

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@FreeBSD.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be
 lost
Date: Mon, 15 Dec 2008 21:14:32 -0800

 Dieter wrote:
 > I found the source of this problem.  When a firewire bus resets,
 > the firewire driver prints a few lines to the console,
 > using printf(9) and device_printf(9).  I suspect that these are
 > running at splfw aka splimp, locking out other i/o.
 >
 > Commenting out the *printf() calls fixes the problem, but that
 > isn't a good solution.
 >
 > Would changing the *printf() calls to log(9) calls be safe?
 > ("safe" meaning other i/o doesn't get locked out)
 >
 > Ah, for the good old days when 19200 baud seemed fast...
 > _______________________________________________
 >   
 Which one are you looking at Dieter?
 
 Sean

From: Katsushi Kobayashi <ikob@ni.aist.go.jp>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org,
 freebsd-drivers@freebsd.org,
 bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be lost
Date: Tue, 16 Dec 2008 14:17:10 +0900

 Hi, 
 
 I believe no side effect is anticipated the message.
 "side effect" includes locking i/o.
 
 ----
 Katsushi Kobayashi
 
 
 
 
 On 2008/12/15, at 4:30, Dieter wrote:
 
 > I found the source of this problem.  When a firewire bus resets,
 > the firewire driver prints a few lines to the console,
 > using printf(9) and device_printf(9).  I suspect that these are
 > running at splfw aka splimp, locking out other i/o.
 >
 > Commenting out the *printf() calls fixes the problem, but that
 > isn't a good solution.
 >
 > Would changing the *printf() calls to log(9) calls be safe?
 > ("safe" meaning other i/o doesn't get locked out)
 >
 > Ah, for the good old days when 19200 baud seemed fast...
 > _______________________________________________
 > freebsd-firewire@freebsd.org mailing list
 > http://lists.freebsd.org/mailman/listinfo/freebsd-firewire
 > To unsubscribe, send any mail to "freebsd-firewire-unsubscribe@freebsd.org"
 

From: Warner Losh <imp@bsdimp.com>
To: freebsd@sopwith.solgatos.com
Cc: freebsd-firewire@FreeBSD.org, freebsd-drivers@FreeBSD.org,
        bug-followup@FreeBSD.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be
 lost 
Date: Mon, 15 Dec 2008 23:01:30 -0700 (MST)

 > I found the source of this problem.  When a firewire bus resets,
 > the firewire driver prints a few lines to the console,
 > using printf(9) and device_printf(9).  I suspect that these are
 > running at splfw aka splimp, locking out other i/o.
 
 This can't be the case.  There's no SPL involved at all.  Maybe
 removing the printfs causes an interrupt to be serviced faster,
 resulting in what appears to be better performance...
 
 > Commenting out the *printf() calls fixes the problem, but that
 > isn't a good solution.
 > 
 > Would changing the *printf() calls to log(9) calls be safe?
 > ("safe" meaning other i/o doesn't get locked out)
 > 
 > Ah, for the good old days when 19200 baud seemed fast...
 > _______________________________________________
 > freebsd-drivers@freebsd.org mailing list
 > http://lists.freebsd.org/mailman/listinfo/freebsd-drivers
 > To unsubscribe, send any mail to "freebsd-drivers-unsubscribe@freebsd.org"
 > 
 > 

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@FreeBSD.org, freebsd-drivers@FreeBSD.org,
	bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Tue, 16 Dec 2008 19:29:13 +0000

 Sean>  Which file in dev/firewire are you looking at?
 
 fwohci.c  and  firewire.c
 
 examples:
 
 printf("non CYCLEMASTER mode\n");
 
 device_printf(fc->dev, "Initiate bus reset\n");
 
 -------------------
 
 Warner>  This can't be the case.  There's no SPL involved at all.  Maybe
 Warner>  removing the printfs causes an interrupt to be serviced faster,
 Warner>  resulting in what appears to be better performance...
 
 With the printfs, Ethernet is not getting serviced.  This
 is repeatable and easily reproduced.   Without the printfs,
 it seems ok.
 
 If it isn't spl, what is locking out Ethernet?

From: "M. Warner Losh" <imp@bsdimp.com>
To: freebsd@sopwith.solgatos.com
Cc: freebsd-firewire@freebsd.org, freebsd-drivers@freebsd.org,
        bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be
 lost 
Date: Tue, 16 Dec 2008 23:59:41 -0700 (MST)

 In message: <200812170329.DAA17848@sopwith.solgatos.com>
             Dieter <freebsd@sopwith.solgatos.com> writes:
 : Sean>  Which file in dev/firewire are you looking at?
 : 
 : fwohci.c  and  firewire.c
 : 
 : examples:
 : 
 : printf("non CYCLEMASTER mode\n");
 : 
 : device_printf(fc->dev, "Initiate bus reset\n");
 : 
 : -------------------
 : 
 : Warner>  This can't be the case.  There's no SPL involved at all.  Maybe
 : Warner>  removing the printfs causes an interrupt to be serviced faster,
 : Warner>  resulting in what appears to be better performance...
 : 
 : With the printfs, Ethernet is not getting serviced.  This
 : is repeatable and easily reproduced.   Without the printfs,
 : it seems ok.
 : 
 : If it isn't spl, what is locking out Ethernet?
 
 interrupt storm?  In old-spl-locked drivers, often times the interrupt
 would be masked while certain operations happened.  In new
 mutex-locked freebsd, there's no way to block the interrupts, so
 sometimes old code gets into an interrupt storm, which starves other
 things.  Not sure why printf would trigger this, however...
 
 Warner

From: Sean Bruno <sbruno@miralink.com>
To: "M. Warner Losh" <imp@bsdimp.com>
Cc: freebsd@sopwith.solgatos.com, freebsd-firewire@freebsd.org, 
 bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be
 lost
Date: Wed, 17 Dec 2008 17:10:53 -0800

 M. Warner Losh wrote:
 > In message: <200812170329.DAA17848@sopwith.solgatos.com>
 >             Dieter <freebsd@sopwith.solgatos.com> writes:
 > : Sean>  Which file in dev/firewire are you looking at?
 > : 
 > : fwohci.c  and  firewire.c
 > : 
 > : examples:
 > : 
 > : printf("non CYCLEMASTER mode\n");
 > : 
 > : device_printf(fc->dev, "Initiate bus reset\n");
 > : 
 > : -------------------
 > : 
 > : Warner>  This can't be the case.  There's no SPL involved at all.  Maybe
 > : Warner>  removing the printfs causes an interrupt to be serviced faster,
 > : Warner>  resulting in what appears to be better performance...
 > : 
 > : With the printfs, Ethernet is not getting serviced.  This
 > : is repeatable and easily reproduced.   Without the printfs,
 > : it seems ok.
 > : 
 > : If it isn't spl, what is locking out Ethernet?
 >
 > interrupt storm?  In old-spl-locked drivers, often times the interrupt
 > would be masked while certain operations happened.  In new
 > mutex-locked freebsd, there's no way to block the interrupts, so
 > sometimes old code gets into an interrupt storm, which starves other
 > things.  Not sure why printf would trigger this, however...
 >
 > Warner
 > _______________________________________________
 > freebsd-drivers@freebsd.org mailing list
 > http://lists.freebsd.org/mailman/listinfo/freebsd-drivers
 > To unsubscribe, send any mail to "freebsd-drivers-unsubscribe@freebsd.org"
 >   
 Let me take a stab at this one this week.  There's got to be something 
 more going on
 than a printf() causing hell.  It could be a mutex being held causing 
 nightmares.
 
 The firewire code is my current project, I'll look at this issue now.
 
 Sean

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Thu, 18 Dec 2008 00:51:41 +0000

 > > : Warner>  This can't be the case.  There's no SPL involved at all.  Maybe
 > > : Warner>  removing the printfs causes an interrupt to be serviced faster,
 > > : Warner>  resulting in what appears to be better performance...
 > > : 
 > > : With the printfs, Ethernet is not getting serviced.  This
 > > : is repeatable and easily reproduced.   Without the printfs,
 > > : it seems ok.
 > > : 
 > > : If it isn't spl, what is locking out Ethernet?
 > >
 > > interrupt storm?  In old-spl-locked drivers, often times the interrupt
 > > would be masked while certain operations happened.  In new
 > > mutex-locked freebsd, there's no way to block the interrupts, so
 > > sometimes old code gets into an interrupt storm, which starves other
 > > things.  Not sure why printf would trigger this, however...
 
 I tried logging in on the console and "cat /etc/termcap" does
 not create a problem.  Systat -vmstat reports that IRQ 5 (console)
 interrupts go from 0 to 120.  CPU:
 	0.0%Sys   0.6%Intr  0.0%User  0.0%Nice 99.4%Idle
 
 The normal firewire bus reset printfs generate 890 IRQ 5 interrupts
 with the shell firing off "fwcontrol -u 1 -r" in a loop.
 CPU goes from idle to 100% busy:
 	98.1%Sys   1.7%Intr  0.3%User  0.0%Nice  0.0%Idle
 
 "fwcontrol -u 0 -r" (the onboard firewire controller) gives
 the same 890 IRQ 5 interrupts, but CPU usage is different:
 	79.2%Sys  20.8%Intr  0.0%User  0.0%Nice  0.0%Idle
 
 A couple of other IRQs go up to 30-40 range, nothing huge.
 
 A single "fwcontrol -u 0 -r" generates 14 interrupts on IRQ 5
 and 4 interrupts on 3 other IRQs.
 
 A single "fwcontrol -u 1 -r" generates 14 interrupts on IRQ 5
 and 3 interrupts on 2 other IRQs.  yawn
 
 ------
 
 With the bus reset printfs commented out, I get:
 
 -u 0
 
    80.0%Sys   4.8%Intr 13.9%User  0.0%Nice  1.2%Idle
 
    3300 interrupts on the onboard firewire controller
 
 -u 1
 
    84.8%Sys   1.1%Intr 14.1%User  0.0%Nice  0.0%Idle
 
    3900 interrupts on the PCI card firewire controller
 
    After awhile the console says:
 
    fwohci1: phy read failed(1). i = 100 retry = 0
    fwohci1: phy read failed(1). i = 100 retry = 1
    fwohci1: phy read failed(1). i = 100 retry = 2
    fwohci1: phy read failed(1). i = 100 retry = 3
    ...
 
    and the system becomes unresponsive.  Reset button required.  :-(
 
 Observations:
 
 Kernel printfs to console appear to take more interrupts and
 more CPU than userland output to console.
 
 Seems odd that the onboard controller uses so much CPU for
 interrupts in the with-printf case, but there wasn't a
 large number on interrupts.  The without-printf case had
 a LOT more interrupts from the fw controller.
 
 Resetting the firewire bus in an endless loop is not recommended.
 
 I'm not an expert on interrupt storms, but none of this looks
 like an interrupt storm to me.  (Assuming systat is trustworthy.)
 
 Looks like the kernel printf takes a *lot* of CPU, and
 somehow locks out Ethernet.

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Fri, 19 Dec 2008 11:02:21 +0000

 Here are the results of two more experiments:
 
 I tried to verify what spl the firewire code is at when calling
 printf(9) by adding the following block into fwohci.c just
 before one of the printfs.
 
 {
 #include <sys/types.h>
 #include <sys/systm.h>
   intrmask_t debug_spl;  /* __uint32_t */
   intrmask_t debug_spl_high;
   intrmask_t debug_spl_tty;
   intrmask_t debug_spl_fw;
   intrmask_t debug_spl_0;
   intrmask_t debug_spl_net;
 
   debug_spl = splhigh();
   debug_spl_high = spltty();
   debug_spl_tty = splfw();
   debug_spl_fw = splnet();
   debug_spl_net = splhigh();
   spl0();  /* void */
   debug_spl_0 = splhigh();
   splx(debug_spl);
 
   printf("fwohci_intr_core(): spl = 0x%x\n splhigh=0x%x spltty=0x%x splfw=0x%x splnet=0x%x spl0=0x%x\n",
          debug_spl, debug_spl_high, debug_spl_tty, debug_spl_fw, debug_spl_net, debug_spl_0);
 }
 
 But my results appear bogus:
 
 fwohci_intr_core(): spl = 0x0
  splhigh=0x0 spltty=0x0 splfw=0x0 splnet=0x0 spl0=0x0
 
 I have examined my code and the spl(9) man page several times
 but I can't find what is wrong.  Any clues?
 
 --------------------------------------------------------
 
 To isolate the effects of printf(9) from the firewire bus reset,
 I picked a trivial system call (chown(2)) and added some printf(9)
 calls.
 
 Calling chown several times and monitoring with systat -vmstat gives:
   1098 interrupts on the console IRQ
   93.1%Sys   6.7%Intr  0.2%User  0.0%Nice  0.0%Idle
 
 This did NOT interfere with Ethernet.
 
 So printf(9) interferes with Ethernet when called from the
 firewire driver, but not when called from a vanilla system call.
 
 --------------------------------------------------------
 
 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 sio0: type 16550A, console
 sio0: [FILTER]
 
 What does [FILTER] mean?  I don't see an explanation on the sio man page.

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be
 lost
Date: Sat, 20 Dec 2008 18:49:21 -0800

 Dieter wrote:
 > Here are the results of two more experiments:
 >
 > I tried to verify what spl the firewire code is at when calling
 > printf(9) by adding the following block into fwohci.c just
 > before one of the printfs.
 >
 > {
 > #include <sys/types.h>
 > #include <sys/systm.h>
 >   intrmask_t debug_spl;  /* __uint32_t */
 >   intrmask_t debug_spl_high;
 >   intrmask_t debug_spl_tty;
 >   intrmask_t debug_spl_fw;
 >   intrmask_t debug_spl_0;
 >   intrmask_t debug_spl_net;
 >
 >   debug_spl = splhigh();
 >   debug_spl_high = spltty();
 >   debug_spl_tty = splfw();
 >   debug_spl_fw = splnet();
 >   debug_spl_net = splhigh();
 >   spl0();  /* void */
 >   debug_spl_0 = splhigh();
 >   splx(debug_spl);
 >
 >   printf("fwohci_intr_core(): spl = 0x%x\n splhigh=0x%x spltty=0x%x splfw=0x%x splnet=0x%x spl0=0x%x\n",
 >          debug_spl, debug_spl_high, debug_spl_tty, debug_spl_fw, debug_spl_net, debug_spl_0);
 > }
 >
 > But my results appear bogus:
 >
 > fwohci_intr_core(): spl = 0x0
 >  splhigh=0x0 spltty=0x0 splfw=0x0 splnet=0x0 spl0=0x0
 >
 > I have examined my code and the spl(9) man page several times
 > but I can't find what is wrong.  Any clues?
 >
 > --------------------------------------------------------
 >
 > To isolate the effects of printf(9) from the firewire bus reset,
 > I picked a trivial system call (chown(2)) and added some printf(9)
 > calls.
 >
 > Calling chown several times and monitoring with systat -vmstat gives:
 >   1098 interrupts on the console IRQ
 >   93.1%Sys   6.7%Intr  0.2%User  0.0%Nice  0.0%Idle
 >
 > This did NOT interfere with Ethernet.
 >
 > So printf(9) interferes with Ethernet when called from the
 > firewire driver, but not when called from a vanilla system call.
 >
 >   
 I setup my system to execute a bus reset every 1 second, simultaneously, 
 I started copying a large
 file onto the system see if anything would happen.
 
 I saw no change to copy speed reported by SCP during the entire 
 transaction.  I also see no change
 to the system load while this is occurring.
 
 This seems indicative of "something else" going on.
 > --------------------------------------------------------
 >
 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 > sio0: type 16550A, console
 > sio0: [FILTER]
 >
 > What does [FILTER] mean?  I don't see an explanation on the sio man page.
 >   
 
 Did you mean to ask a different list?  Because I have no idea.  :)

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Sat, 20 Dec 2008 21:24:23 +0000

 In message <494DAEB1.1070909@miralink.com>, Sean Bruno writes:
 
 > I setup my system to execute a bus reset every 1 second, simultaneously, 
 > I started copying a large
 > file onto the system see if anything would happen.
 > 
 > I saw no change to copy speed reported by SCP during the entire 
 > transaction.  I also see no change
 > to the system load while this is occurring.
 
 Does your system have a RS-232 console or VGA console?
 The problem might be specific to RS-232.
 
 > This seems indicative of "something else" going on.
 
 Any idea what this "something else" might be?
 
 Does anyone have a clue why my spl calls are returning 0 ?

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be
 lost
Date: Mon, 22 Dec 2008 19:21:49 -0800

 Dieter wrote:
 > In message <494DAEB1.1070909@miralink.com>, Sean Bruno writes:
 >
 >   
 >> I setup my system to execute a bus reset every 1 second, simultaneously, 
 >> I started copying a large
 >> file onto the system see if anything would happen.
 >>
 >> I saw no change to copy speed reported by SCP during the entire 
 >> transaction.  I also see no change
 >> to the system load while this is occurring.
 >>     
 >
 > Does your system have a RS-232 console or VGA console?
 > The problem might be specific to RS-232.
 >
 >   
 I have both.  I disabled my serial console to test, no change when using
 the VGA console only.
 
 >> This seems indicative of "something else" going on.
 >>     
 >
 > Any idea what this "something else" might be?
 >
 > Does anyone have a clue why my spl calls are returning 0 ?
 >
 >   
 I confirmed that spl's are complete no-ops since rel 5.  So, you want to 
 ignore
 them as they are just markers now where locking should be implemented.
 
 I went back in your original submission and I can see a significant drop 
 in I/O
 when resetting the F/W bus.   If I have the following running:
 dd if=/dev/zero of=/dev/ad6 bs=64k
 
 Then I see about 64MB/S normally.  When I reset any firewire bus, I see 
 the throughput
 drop as reported by iostat by a significant amount(2-4 MB/s).
 
 I am assuming that the firewire driver is holding a very heavy lock here 
 that is cascading
 down to the printf().  At least, that would be my ASSumption.  :)
 
 I will continue to poke around.  You are not crazy Dieter.
 
 Sean
 
 

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be
 lost
Date: Mon, 22 Dec 2008 22:01:55 -0800

 Sean Bruno wrote:
 > Dieter wrote:
 >> In message <494DAEB1.1070909@miralink.com>, Sean Bruno writes:
 >>
 >>  
 >>> I setup my system to execute a bus reset every 1 second, 
 >>> simultaneously, I started copying a large
 >>> file onto the system see if anything would happen.
 >>>
 >>> I saw no change to copy speed reported by SCP during the entire 
 >>> transaction.  I also see no change
 >>> to the system load while this is occurring.
 >>>     
 >>
 >> Does your system have a RS-232 console or VGA console?
 >> The problem might be specific to RS-232.
 >>
 >>   
 > I have both.  I disabled my serial console to test, no change when using
 > the VGA console only.
 >
 >>> This seems indicative of "something else" going on.
 >>>     
 >>
 >> Any idea what this "something else" might be?
 >>
 >> Does anyone have a clue why my spl calls are returning 0 ?
 >>
 >>   
 > I confirmed that spl's are complete no-ops since rel 5.  So, you want 
 > to ignore
 > them as they are just markers now where locking should be implemented.
 >
 > I went back in your original submission and I can see a significant 
 > drop in I/O
 > when resetting the F/W bus.   If I have the following running:
 > dd if=/dev/zero of=/dev/ad6 bs=64k
 >
 > Then I see about 64MB/S normally.  When I reset any firewire bus, I 
 > see the throughput
 > drop as reported by iostat by a significant amount(2-4 MB/s).
 >
 > I am assuming that the firewire driver is holding a very heavy lock 
 > here that is cascading
 > down to the printf().  At least, that would be my ASSumption.  :)
 >
 > I will continue to poke around.  You are not crazy Dieter.
 >
 > Sean
 >
 >
 Hrm ... Looking over the log messages that are generated during a bus 
 reset, there's
 a mish-mash of printf() and device_printf() calls.  I'm not sure what 
 the impact of that
 is to real behavior, but /var/log/messages has a tendency to get garbled 
 like this:
 
 Dec 22 16:00:18 home-test kernel: fwohci1: Initiate bus reset
 Dec 22 16:00:18 home-test kernel: fwohci1: BUS reset
 Dec 22 16:00:18 home-test kernel: fwohci1: node_id=0xc800ffc0, gen=8, 
 CYCLEMASTER mode
 Dec 22 16:00:18 home-test kernel: firewi
 Dec 22 16:00:18 home-test kernel: re1:
 Dec 22 16:00:18 home-test kernel: 1 n
 Dec 22 16:00:18 home-test kernel: odes
 Dec 22 16:00:18 home-test kernel: , ma
 Dec 22 16:00:18 home-test kernel: xhop
 Dec 22 16:00:18 home-test kernel: <=
 Dec 22 16:00:18 home-test kernel: 0, c
 Dec 22 16:00:18 home-test kernel: able
 Dec 22 16:00:18 home-test kernel: IRM
 Dec 22 16:00:18 home-test kernel: = 0
 Dec 22 16:00:18 home-test kernel: (me
 Dec 22 16:00:18 home-test kernel: )
 Dec 22 16:00:18 home-test kernel: f
 Dec 22 16:00:18 home-test kernel: irew
 Dec 22 16:00:18 home-test kernel: ire1
 Dec 22 16:00:18 home-test kernel: : bu
 Dec 22 16:00:18 home-test kernel: s ma
 Dec 22 16:00:18 home-test kernel: nage
 Dec 22 16:00:18 home-test kernel: r 0
 Dec 22 16:00:18 home-test kernel: (me)
 Dec 22 16:00:18 home-test kernel:
 
 
 Let me try to eliminate this behaviour first.
 
 Sean

From: Dieter <freebsd@sopwith.solgatos.com>
To: Sean Bruno <sbruno@miralink.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Thu, 25 Dec 2008 22:23:09 +0000

 > > I confirmed that spl's are complete no-ops since rel 5.  So, you want 
 > > to ignore
 > > them as they are just markers now where locking should be implemented.
 
 I hunted down the spl code, and you're right.  Wow, I wonder how drivers
 still using spl calls work at all?
 
 > is to real behavior, but /var/log/messages has a tendency to get garbled 
 > like this:
 > 
 > Dec 22 16:00:18 home-test kernel: fwohci1: Initiate bus reset
 > Dec 22 16:00:18 home-test kernel: fwohci1: BUS reset
 > Dec 22 16:00:18 home-test kernel: fwohci1: node_id=0xc800ffc0, gen=8, 
 > CYCLEMASTER mode
 > Dec 22 16:00:18 home-test kernel: firewi
 > Dec 22 16:00:18 home-test kernel: re1:
 > Dec 22 16:00:18 home-test kernel: 1 n
 > Dec 22 16:00:18 home-test kernel: odes
 > Dec 22 16:00:18 home-test kernel: , ma
 > Dec 22 16:00:18 home-test kernel: xhop
 > Dec 22 16:00:18 home-test kernel: <=
 > Dec 22 16:00:18 home-test kernel: 0, c
 > Dec 22 16:00:18 home-test kernel: able
 
 Do the lines get folded on the console, or only in /var/log/messages?

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be
 lost
Date: Sun, 28 Dec 2008 14:15:36 -0800

 Dieter wrote:
 >>> I confirmed that spl's are complete no-ops since rel 5.  So, you want 
 >>> to ignore
 >>> them as they are just markers now where locking should be implemented.
 >>>       
 >
 > I hunted down the spl code, and you're right.  Wow, I wonder how drivers
 > still using spl calls work at all?
 >
 >   
 I believe that the spl() calls are just left there as a hint where 
 locking should be.
 
 As far as I understand, we need to pay attention to the mutex locks.
 >> is to real behavior, but /var/log/messages has a tendency to get garbled 
 >> like this:
 >>
 >> Dec 22 16:00:18 home-test kernel: fwohci1: Initiate bus reset
 >> Dec 22 16:00:18 home-test kernel: fwohci1: BUS reset
 >> Dec 22 16:00:18 home-test kernel: fwohci1: node_id=0xc800ffc0, gen=8, 
 >> CYCLEMASTER mode
 >> Dec 22 16:00:18 home-test kernel: firewi
 >> Dec 22 16:00:18 home-test kernel: re1:
 >> Dec 22 16:00:18 home-test kernel: 1 n
 >> Dec 22 16:00:18 home-test kernel: odes
 >> Dec 22 16:00:18 home-test kernel: , ma
 >> Dec 22 16:00:18 home-test kernel: xhop
 >> Dec 22 16:00:18 home-test kernel: <=
 >> Dec 22 16:00:18 home-test kernel: 0, c
 >> Dec 22 16:00:18 home-test kernel: able
 >>     
 >
 > Do the lines get folded on the console, or only in /var/log/messages?
 >   
 
 As far as I can see, the console messages are fine.  It's only the 
 messages that get
 garbled.
 
 Sean

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Sun, 28 Dec 2008 21:49:00 +0000

 > >>> I confirmed that spl's are complete no-ops since rel 5.  So, you want 
 > >>> to ignore
 > >>> them as they are just markers now where locking should be implemented.
 > >>>       
 > >
 > > I hunted down the spl code, and you're right.  Wow, I wonder how drivers
 > > still using spl calls work at all?
 > >
 > >   
 > I believe that the spl() calls are just left there as a hint where 
 > locking should be.
 > 
 > As far as I understand, we need to pay attention to the mutex locks.
 
 I'll rephrase my question.  In the old days, locking was done with spl.
 The new way is with mutex.  But with the spl calls being replaced with
 noops, and as far as I can tell the driver is not using mutex, there
 doesn't appear to be any locking.  So the driver can step on itself.
 
 
 > >> is to real behavior, but /var/log/messages has a tendency to get garbled 
 > >> like this:
 > >>
 > >> Dec 22 16:00:18 home-test kernel: fwohci1: Initiate bus reset
 > >> Dec 22 16:00:18 home-test kernel: fwohci1: BUS reset
 > >> Dec 22 16:00:18 home-test kernel: fwohci1: node_id=0xc800ffc0, gen=8, 
 > >> CYCLEMASTER mode
 > >> Dec 22 16:00:18 home-test kernel: firewi
 > >> Dec 22 16:00:18 home-test kernel: re1:
 > >> Dec 22 16:00:18 home-test kernel: 1 n
 > >> Dec 22 16:00:18 home-test kernel: odes
 > >> Dec 22 16:00:18 home-test kernel: , ma
 > >> Dec 22 16:00:18 home-test kernel: xhop
 > >> Dec 22 16:00:18 home-test kernel: <=
 > >> Dec 22 16:00:18 home-test kernel: 0, c
 > >> Dec 22 16:00:18 home-test kernel: able
 > >>     
 > >
 > > Do the lines get folded on the console, or only in /var/log/messages?
 > >   
 > 
 > As far as I can see, the console messages are fine.  It's only the 
 > messages that get
 > garbled.
 
 Perhaps an artifact of syslogd?

From: Sean Bruno <sbruno@miralink.com>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: firewire bus reset hogs CPU,	causing data to be
 lost
Date: Sun, 28 Dec 2008 22:22:26 -0800

 Dieter wrote:
 >>>
 >>>   
 >>>       
 >> I believe that the spl() calls are just left there as a hint where 
 >> locking should be.
 >>
 >> As far as I understand, we need to pay attention to the mutex locks.
 >>     
 >
 > I'll rephrase my question.  In the old days, locking was done with spl.
 > The new way is with mutex.  But with the spl calls being replaced with
 > noops, and as far as I can tell the driver is not using mutex, there
 > doesn't appear to be any locking.  So the driver can step on itself.
 >
 >   
 Well, there is locking around a couple of mutex's via FW_GLOCK().
 
 It appears that the locking is not robust, and that is one of the issues 
 that I am
 looking into right now.
 >   
 >>>> is to real behavior, but /var/log/messages has a tendency to get garbled 
 >>>> like this:
 >>>>
 >>>> Dec 22 16:00:18 home-test kernel: fwohci1: Initiate bus reset
 >>>> Dec 22 16:00:18 home-test kernel: fwohci1: BUS reset
 >>>> Dec 22 16:00:18 home-test kernel: fwohci1: node_id=0xc800ffc0, gen=8, 
 >>>> CYCLEMASTER mode
 >>>> Dec 22 16:00:18 home-test kernel: firewi
 >>>> Dec 22 16:00:18 home-test kernel: re1:
 >>>> Dec 22 16:00:18 home-test kernel: 1 n
 >>>> Dec 22 16:00:18 home-test kernel: odes
 >>>> Dec 22 16:00:18 home-test kernel: , ma
 >>>> Dec 22 16:00:18 home-test kernel: xhop
 >>>> Dec 22 16:00:18 home-test kernel: <=
 >>>> Dec 22 16:00:18 home-test kernel: 0, c
 >>>> Dec 22 16:00:18 home-test kernel: able
 >>>>     
 >>>>         
 >>> Do the lines get folded on the console, or only in /var/log/messages?
 >>>   
 >>>       
 >> As far as I can see, the console messages are fine.  It's only the 
 >> messages that get
 >> garbled.
 >>     
 >
 > Perhaps an artifact of syslogd?
 >   
 I doubt it.  I'm working on a patch that improves the locking a bit and 
 does some
 other "gross" things to try and keep things from flying apart.
 
 I've SEEN this behaviour in my implementations with sbp_targ and 
 couldn't pin it down.
 
 Scott Long gave me a couple of pointers this evening, but I'm still 
 working on locking
 down the taskqueues and some of the callback_handlers.  There are some 
 bad things going
 on specifically during initialization that are pre-empting normal operation.
 
 Sean

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Sun, 28 Dec 2008 23:30:55 +0000

 > > I'll rephrase my question.  In the old days, locking was done with spl.
 > > The new way is with mutex.  But with the spl calls being replaced with
 > > noops, and as far as I can tell the driver is not using mutex, there
 > > doesn't appear to be any locking.  So the driver can step on itself.
 > 
 > 
 > Well, there is locking around a couple of mutex's via FW_GLOCK().
 
 Ah, I wasn't grepping for the right string.  So there *is* mutex
 locking.  Although the lingering spl calls are still troubling.

From: Sean Bruno <sean.bruno@dsl-only.net>
To: bug-followup@FreeBSD.org, freebsd@sopwith.solgatos.com
Cc: slong@freebsd.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing
 data to be lost
Date: Mon, 05 Jan 2009 10:23:56 -0800

 --=-Y8aWkChKny4g1UrmtYlV
 Content-Type: text/plain
 Content-Transfer-Encoding: 7bit
 
 Well ... this has really sent me down the rabbit hole the last couple of
 days.
 
 There is a need to audit all locking in the firewire stack right now and
 I have started that task.
 
 Essentially, threads, callouts, interrupts and task queues are all
 jumping around causing context to be switched from one thread to
 another.  It's kind of bad in there and I need to sort it out.
 
 I'm working with a stable/7 tree, so I've started a patch for it.  This
 patch has quite a few printf -> device_printf changes in it, so try to 
 ignore thost for now.
 
 The meat of the patch is a judicious implementation of FW_GLOCK() in
 certain code areas.  Note that sometimes the code is just trying to get
 the lock and then drops it immediately.  This is not very optimal, but
 it does the trick.
 
 I'm still seeing a high level of broken log messages
 in /var/log/messages but this may help the issue you were seeing.  Give
 it a spin.
 
 Sean
 
 --=-Y8aWkChKny4g1UrmtYlV
 Content-Disposition: attachment; filename="firewire.diff"
 Content-Type: text/x-patch; name="firewire.diff"; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 
 Index: fwohci_pci.c
 ===================================================================
 --- fwohci_pci.c	(revision 186781)
 +++ fwohci_pci.c	(working copy)
 @@ -337,11 +337,7 @@
  
  	err = bus_setup_intr(self, sc->irq_res,
  			INTR_TYPE_NET | INTR_MPSAFE,
 -#if FWOHCI_INTFILT
 -		     fwohci_filt, NULL, sc, &sc->ih);
 -#else
  		     NULL, (driver_intr_t *) fwohci_intr, sc, &sc->ih);
 -#endif
  #if defined(__DragonFly__) || __FreeBSD_version < 500000
  	/* XXX splcam() should mask this irq for sbp.c*/
  	err = bus_setup_intr(self, sc->irq_res, INTR_TYPE_CAM,
 Index: firewire.c
 ===================================================================
 --- firewire.c	(revision 186781)
 +++ firewire.c	(working copy)
 @@ -1240,8 +1240,8 @@
  	fp->mode.common.tcode |= FWTCODE_PHY;
  
  	if (firewire_debug)
 -		printf("send phy_config root_node=%d gap_count=%d\n",
 -						root_node, gap_count);
 +		device_printf(fc->dev, "%s: phy_config root_node=%d gap_count=%d\n",
 +				__func__, root_node, gap_count);
  	fw_asyreq(fc, -1, xfer);
  }
  
 @@ -1285,7 +1285,7 @@
  	self_id = &fc->topology_map->self_id[0];
  	for(i = 0; i < fc->sid_cnt; i ++){
  		if (sid[1] != ~sid[0]) {
 -			printf("fw_sidrcv: invalid self-id packet\n");
 +			device_printf(fc->bdev, "%s: invalid self-id packet\n", __func__);
  			sid += 2;
  			continue;
  		}
 @@ -1331,7 +1331,6 @@
  		self_id++;
  		fc->topology_map->self_id_count ++;
  	}
 -	device_printf(fc->bdev, "%d nodes", fc->max_node + 1);
  	/* CRC */
  	fc->topology_map->crc = fw_crc16(
  			(uint32_t *)&fc->topology_map->generation,
 @@ -1350,16 +1349,16 @@
  	bcopy(p, &CSRARC(fc, SPED_MAP + 8), (fc->speed_map->crc_len - 1)*4);
  
  	fc->max_hop = fc->max_node - i_branch;
 -	printf(", maxhop <= %d", fc->max_hop);
 +	device_printf(fc->bdev, "%s: %d nodes, maxhop <= %d\n",
 +			__func__, fc->max_node + 1, fc->max_hop);
  		
  	if(fc->irm == -1 ){
 -		printf(", Not found IRM capable node");
 +		device_printf(fc->bdev, "%s: No IRM capable node", __func__);
  	}else{
 -		printf(", cable IRM = %d", fc->irm);
 -		if (fc->irm == fc->nodeid)
 -			printf(" (me)");
 +		device_printf(fc->bdev, "%s: IRM capable node = %d, %s\n",
 +				 __func__, fc->irm,
 +				(fc->irm == fc->nodeid) ? " (me)" : "");
  	}
 -	printf("\n");
  
  	if (try_bmr && (fc->irm != -1) && (CSRARC(fc, BUS_MGR_ID) == 0x3f)) {
  		if (fc->irm == fc->nodeid) {
 @@ -1388,6 +1387,7 @@
  	struct fw_device *fwdev;
  
  	s = splfw();
 +	FW_GLOCK(fc);
  	fc->status = FWBUSEXPLORE;
  
  	/* Invalidate all devices, just after bus reset. */
 @@ -1396,6 +1396,7 @@
  			fwdev->status = FWDEVINVAL;
  			fwdev->rcnt = 0;
  		}
 +	FW_GUNLOCK(fc);
  	splx(s);
  
  	wakeup((void *)fc);
 @@ -1647,20 +1648,27 @@
  
  	fc = (struct firewire_comm *)arg;
  
 -	mtx_lock(&fc->wait_lock);
 +	/*mtx_lock(&fc->wait_lock);*/
 +	FW_GLOCK(fc);
  	while (fc->status != FWBUSDETACH) {
 +		FW_GUNLOCK(fc);
  		if (fc->status == FWBUSEXPLORE) {
 -			mtx_unlock(&fc->wait_lock);
 +			/*mtx_unlock(&fc->wait_lock);*/
  			fw_explore(fc);
  			fc->status = FWBUSEXPDONE;
 -			if (firewire_debug)
 -				printf("bus_explore done\n");
 +			if (firewire_debug) {
 +				device_printf(fc->dev,"%s: bus_explore done\n",
 +						__func__);
 +			}
  			fw_attach_dev(fc);
 -			mtx_lock(&fc->wait_lock);
 +			/*mtx_lock(&fc->wait_lock);*/
  		}
 -		msleep((void *)fc, &fc->wait_lock, PWAIT|PCATCH, "-", 0);
 +		FW_GLOCK(fc);
 +		/*msleep((void *)fc, &fc->wait_lock, PWAIT|PCATCH, "-", 0);*/
 +		msleep((void *)fc, FW_GMTX(fc), PWAIT|PCATCH, "-", 0);
  	}
 -	mtx_unlock(&fc->wait_lock);
 +	/*mtx_unlock(&fc->wait_lock);*/
 +	FW_GUNLOCK(fc);
  	kthread_exit(0);
  }
  
 Index: fwohci.c
 ===================================================================
 --- fwohci.c	(revision 186781)
 +++ fwohci.c	(working copy)
 @@ -1003,7 +1003,8 @@
  	if (maxdesc < db_tr->dbcnt) {
  		maxdesc = db_tr->dbcnt;
  		if (firewire_debug)
 -			device_printf(sc->fc.dev, "maxdesc: %d\n", maxdesc);
 +			device_printf(sc->fc.dev, "%s: maxdesc: %d\n",
 +					__func__, maxdesc);
  	}
  	/* last db */
  	LAST_DB(db_tr, db);
 @@ -1024,7 +1025,8 @@
  	if(db_tr != dbch->bottom){
  		goto txloop;
  	} else {
 -		device_printf(sc->fc.dev, "fwohci_start: lack of db_trq\n");
 +		device_printf(sc->fc.dev, "%s: lack of db_trq\n",
 +				__func__);
  		dbch->flags |= FWOHCI_DBCH_FULL;
  	}
  kick:
 @@ -1036,8 +1038,8 @@
  		OWRITE(sc, OHCI_DMACTL(off), OHCI_CNTL_DMA_WAKE);
  	} else {
  		if (firewire_debug)
 -			device_printf(sc->fc.dev, "start AT DMA status=%x\n",
 -					OREAD(sc, OHCI_DMACTL(off)));
 +			device_printf(sc->fc.dev, "%s: start AT DMA status=%x\n",
 +					__func__, OREAD(sc, OHCI_DMACTL(off)));
  		OWRITE(sc, OHCI_DMACMD(off), dbch->top->bus_addr | fsegment);
  		OWRITE(sc, OHCI_DMACTL(off), OHCI_CNTL_DMA_RUN);
  		dbch->xferq.flag |= FWXFERQ_RUNNING;
 @@ -1848,7 +1850,7 @@
  		/* Disable bus reset interrupt until sid recv. */
  		OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_PHY_BUS_R);
  	
 -		device_printf(fc->dev, "BUS reset\n");
 +		device_printf(fc->dev, "%s: BUS reset\n", __func__);
  		OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
  		OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCSRC);
  
 @@ -1885,10 +1887,10 @@
  		plen = OREAD(sc, OHCI_SID_CNT);
  
  		fc->nodeid = node_id & 0x3f;
 -		device_printf(fc->dev, "node_id=0x%08x, gen=%d, ",
 -			node_id, (plen >> 16) & 0xff);
 +		device_printf(fc->dev, "%s: node_id=0x%08x, gen=%d\n",
 +			__func__, node_id, (plen >> 16) & 0xff);
  		if (!(node_id & OHCI_NODE_VALID)) {
 -			printf("Bus reset failure\n");
 +			device_printf(fc->dev, "%s: Bus reset failure\n", __func__);
  			goto sidout;
  		}
  
 @@ -1896,11 +1898,11 @@
  		sc->cycle_lost = 0;
  		OWRITE(sc, FWOHCI_INTMASK,  OHCI_INT_CYC_LOST);
  		if ((node_id & OHCI_NODE_ROOT) && !nocyclemaster) {
 -			printf("CYCLEMASTER mode\n");
 +			device_printf(fc->dev, "%s: CYCLEMASTER mode\n", __func__);
  			OWRITE(sc, OHCI_LNKCTL,
  				OHCI_CNTL_CYCMTR | OHCI_CNTL_CYCTIMER);
  		} else {
 -			printf("non CYCLEMASTER mode\n");
 +			device_printf(fc->dev, "%s: non CYCLEMASTER mode\n", __func__);
  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCMTR);
  			OWRITE(sc, OHCI_LNKCTL, OHCI_CNTL_CYCTIMER);
  		}
 @@ -1922,6 +1924,8 @@
  	u_int i;
  	struct firewire_comm *fc = (struct firewire_comm *)sc;
  
 +	FW_GLOCK(fc);
 +	FW_GUNLOCK(fc);
  	if (stat & OHCI_INT_DMA_IR) {
  		irstat = atomic_readandclear_int(&sc->irstat);
  		for(i = 0; i < fc->nisodma ; i++){
 @@ -1969,8 +1973,8 @@
  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCTIMER);
  #endif
  			OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
 -			device_printf(fc->dev, "too many cycle lost, "
 -			 "no cycle master presents?\n");
 +			device_printf(fc->dev, "%s: too many cycle lost, "
 +			 "no cycle master presents?\n", __func__);
  		}
  	}
  	if (stat & OHCI_INT_DMA_ATRQ) {
 @@ -1986,7 +1990,7 @@
  		device_printf(fc->dev, "unrecoverable error\n");
  	}
  	if (stat & OHCI_INT_PHY_INT) {
 -		device_printf(fc->dev, "phy int\n");
 +		device_printf(fc->dev, "%s: phy int\n", __func__);
  	}
  
  	return;
 @@ -1997,6 +2001,8 @@
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)arg;
  
 +	FW_GLOCK(&sc->fc);
 +	FW_GUNLOCK(&sc->fc);
  	fw_busreset(&sc->fc, FWBUSRESET);
  	OWRITE(sc, OHCI_CROMHDR, ntohl(sc->fc.config_rom[0]));
  	OWRITE(sc, OHCI_BUS_OPT, ntohl(sc->fc.config_rom[2]));
 @@ -2011,6 +2017,8 @@
  	int i, plen;
  
  
 +	FW_GLOCK(fc);
 +	FW_GUNLOCK(fc);
  	plen = OREAD(sc, OHCI_SID_CNT);
  
  	if (plen & OHCI_SID_ERR) {
 @@ -2062,7 +2070,16 @@
  {
  	uint32_t stat, irstat, itstat;
  
 +	/*
 +	 * Try and grab the global lock
 +	 * for a moment to keep us from
 +	 * firing while we are still 
 +	 * setting the interrupt condition
 +	 * on the bus
 +	 */
 +	FW_GLOCK(&sc->fc);
  	stat = OREAD(sc, FWOHCI_INTSTAT);
 +	FW_GUNLOCK(&sc->fc);
  	if (stat == 0xffffffff) {
  		device_printf(sc->fc.dev, 
  			"device physically ejected?\n");
 @@ -2091,25 +2108,15 @@
  	return (FILTER_HANDLED);
  }
  
 -int
 -fwohci_filt(void *arg)
 +void
 +fwohci_intr(void *arg)
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)arg;
  
 -	if (!(sc->intmask & OHCI_INT_EN)) {
 -		/* polling mode */
 -		return (FILTER_STRAY);
 -	}
 -	return (fwohci_check_stat(sc));
 +	fwohci_check_stat(sc);
  }
  
  void
 -fwohci_intr(void *arg)
 -{
 -	fwohci_filt(arg);
 -}
 -
 -void
  fwohci_poll(struct firewire_comm *fc, int quick, int count)
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)fc;
 @@ -2464,6 +2471,7 @@
  	struct fwohci_softc *sc;
  	uint32_t fun;
  
 +	FW_GLOCK(fc);
  	device_printf(fc->dev, "Initiate bus reset\n");
  	sc = (struct fwohci_softc *)fc;
  
 @@ -2487,6 +2495,7 @@
  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
  #endif
 +	FW_GUNLOCK(fc);
  }
  
  void
 Index: fwohcivar.h
 ===================================================================
 --- fwohcivar.h	(revision 186781)
 +++ fwohcivar.h	(working copy)
 @@ -37,12 +37,6 @@
  
  #include <sys/taskqueue.h>
  
 -#if defined(__DragonFly__) ||  __FreeBSD_version < 700043
 -#define FWOHCI_INTFILT	0
 -#else
 -#define FWOHCI_INTFILT	1
 -#endif
 -
  typedef struct fwohci_softc {
  	struct firewire_comm fc;
  	bus_space_tag_t bst;
 @@ -84,7 +78,6 @@
  } fwohci_softc_t;
  
  void fwohci_intr (void *arg);
 -int fwohci_filt (void *arg);
  int fwohci_init (struct fwohci_softc *, device_t);
  void fwohci_poll (struct firewire_comm *, int, int);
  void fwohci_reset (struct fwohci_softc *, device_t);
 
 --=-Y8aWkChKny4g1UrmtYlV--
 

From: Sean Bruno <sean.bruno@dsl-only.net>
To: bug-followup@FreeBSD.org, freebsd@sopwith.solgatos.com,  freebsd-firewire@freebsd.org
Cc: scottl@freebsd.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing
 data to be lost
Date: Mon, 05 Jan 2009 10:56:37 -0800

 --=-bdk1tSakYWl4EHnChn2C
 Content-Type: text/plain
 Content-Transfer-Encoding: 7bit
 
 Well ... this has really sent me down the rabbit hole the last couple of
 days.
 
 There is a need to audit all locking in the firewire stack right now and
 I have started that task.
 
 Essentially, threads, callouts, interrupts and task queues are all
 jumping around causing context to be switched from one thread to
 another.  It's kind of bad in there and I need to sort it out.
 
 I'm working with a stable/7 tree, so I've started a patch for it.  This
 patch has quite a few printf -> device_printf changes in it, so try to 
 ignore thost for now.
 
 The meat of the patch is a judicious implementation of FW_GLOCK() in
 certain code areas.  Note that sometimes the code is just trying to get
 the lock and then drops it immediately.  This is not very optimal, but
 it does the trick.
 
 I'm still seeing a high level of broken log messages
 in /var/log/messages but this may help the issue you were seeing.  Give
 it a spin.
 
 Sean
 
 --=-bdk1tSakYWl4EHnChn2C
 Content-Disposition: attachment; filename="firewire.diff"
 Content-Type: text/x-patch; name="firewire.diff"; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 
 Index: fwohci_pci.c
 ===================================================================
 --- fwohci_pci.c	(revision 186781)
 +++ fwohci_pci.c	(working copy)
 @@ -337,11 +337,7 @@
  
  	err = bus_setup_intr(self, sc->irq_res,
  			INTR_TYPE_NET | INTR_MPSAFE,
 -#if FWOHCI_INTFILT
 -		     fwohci_filt, NULL, sc, &sc->ih);
 -#else
  		     NULL, (driver_intr_t *) fwohci_intr, sc, &sc->ih);
 -#endif
  #if defined(__DragonFly__) || __FreeBSD_version < 500000
  	/* XXX splcam() should mask this irq for sbp.c*/
  	err = bus_setup_intr(self, sc->irq_res, INTR_TYPE_CAM,
 Index: firewire.c
 ===================================================================
 --- firewire.c	(revision 186781)
 +++ firewire.c	(working copy)
 @@ -1240,8 +1240,8 @@
  	fp->mode.common.tcode |= FWTCODE_PHY;
  
  	if (firewire_debug)
 -		printf("send phy_config root_node=%d gap_count=%d\n",
 -						root_node, gap_count);
 +		device_printf(fc->dev, "%s: phy_config root_node=%d gap_count=%d\n",
 +				__func__, root_node, gap_count);
  	fw_asyreq(fc, -1, xfer);
  }
  
 @@ -1285,7 +1285,7 @@
  	self_id = &fc->topology_map->self_id[0];
  	for(i = 0; i < fc->sid_cnt; i ++){
  		if (sid[1] != ~sid[0]) {
 -			printf("fw_sidrcv: invalid self-id packet\n");
 +			device_printf(fc->bdev, "%s: invalid self-id packet\n", __func__);
  			sid += 2;
  			continue;
  		}
 @@ -1331,7 +1331,6 @@
  		self_id++;
  		fc->topology_map->self_id_count ++;
  	}
 -	device_printf(fc->bdev, "%d nodes", fc->max_node + 1);
  	/* CRC */
  	fc->topology_map->crc = fw_crc16(
  			(uint32_t *)&fc->topology_map->generation,
 @@ -1350,16 +1349,16 @@
  	bcopy(p, &CSRARC(fc, SPED_MAP + 8), (fc->speed_map->crc_len - 1)*4);
  
  	fc->max_hop = fc->max_node - i_branch;
 -	printf(", maxhop <= %d", fc->max_hop);
 +	device_printf(fc->bdev, "%s: %d nodes, maxhop <= %d\n",
 +			__func__, fc->max_node + 1, fc->max_hop);
  		
  	if(fc->irm == -1 ){
 -		printf(", Not found IRM capable node");
 +		device_printf(fc->bdev, "%s: No IRM capable node", __func__);
  	}else{
 -		printf(", cable IRM = %d", fc->irm);
 -		if (fc->irm == fc->nodeid)
 -			printf(" (me)");
 +		device_printf(fc->bdev, "%s: IRM capable node = %d, %s\n",
 +				 __func__, fc->irm,
 +				(fc->irm == fc->nodeid) ? " (me)" : "");
  	}
 -	printf("\n");
  
  	if (try_bmr && (fc->irm != -1) && (CSRARC(fc, BUS_MGR_ID) == 0x3f)) {
  		if (fc->irm == fc->nodeid) {
 @@ -1388,6 +1387,7 @@
  	struct fw_device *fwdev;
  
  	s = splfw();
 +	FW_GLOCK(fc);
  	fc->status = FWBUSEXPLORE;
  
  	/* Invalidate all devices, just after bus reset. */
 @@ -1396,6 +1396,7 @@
  			fwdev->status = FWDEVINVAL;
  			fwdev->rcnt = 0;
  		}
 +	FW_GUNLOCK(fc);
  	splx(s);
  
  	wakeup((void *)fc);
 @@ -1647,20 +1648,27 @@
  
  	fc = (struct firewire_comm *)arg;
  
 -	mtx_lock(&fc->wait_lock);
 +	/*mtx_lock(&fc->wait_lock);*/
 +	FW_GLOCK(fc);
  	while (fc->status != FWBUSDETACH) {
 +		FW_GUNLOCK(fc);
  		if (fc->status == FWBUSEXPLORE) {
 -			mtx_unlock(&fc->wait_lock);
 +			/*mtx_unlock(&fc->wait_lock);*/
  			fw_explore(fc);
  			fc->status = FWBUSEXPDONE;
 -			if (firewire_debug)
 -				printf("bus_explore done\n");
 +			if (firewire_debug) {
 +				device_printf(fc->dev,"%s: bus_explore done\n",
 +						__func__);
 +			}
  			fw_attach_dev(fc);
 -			mtx_lock(&fc->wait_lock);
 +			/*mtx_lock(&fc->wait_lock);*/
  		}
 -		msleep((void *)fc, &fc->wait_lock, PWAIT|PCATCH, "-", 0);
 +		FW_GLOCK(fc);
 +		/*msleep((void *)fc, &fc->wait_lock, PWAIT|PCATCH, "-", 0);*/
 +		msleep((void *)fc, FW_GMTX(fc), PWAIT|PCATCH, "-", 0);
  	}
 -	mtx_unlock(&fc->wait_lock);
 +	/*mtx_unlock(&fc->wait_lock);*/
 +	FW_GUNLOCK(fc);
  	kthread_exit(0);
  }
  
 Index: fwohci.c
 ===================================================================
 --- fwohci.c	(revision 186781)
 +++ fwohci.c	(working copy)
 @@ -1003,7 +1003,8 @@
  	if (maxdesc < db_tr->dbcnt) {
  		maxdesc = db_tr->dbcnt;
  		if (firewire_debug)
 -			device_printf(sc->fc.dev, "maxdesc: %d\n", maxdesc);
 +			device_printf(sc->fc.dev, "%s: maxdesc: %d\n",
 +					__func__, maxdesc);
  	}
  	/* last db */
  	LAST_DB(db_tr, db);
 @@ -1024,7 +1025,8 @@
  	if(db_tr != dbch->bottom){
  		goto txloop;
  	} else {
 -		device_printf(sc->fc.dev, "fwohci_start: lack of db_trq\n");
 +		device_printf(sc->fc.dev, "%s: lack of db_trq\n",
 +				__func__);
  		dbch->flags |= FWOHCI_DBCH_FULL;
  	}
  kick:
 @@ -1036,8 +1038,8 @@
  		OWRITE(sc, OHCI_DMACTL(off), OHCI_CNTL_DMA_WAKE);
  	} else {
  		if (firewire_debug)
 -			device_printf(sc->fc.dev, "start AT DMA status=%x\n",
 -					OREAD(sc, OHCI_DMACTL(off)));
 +			device_printf(sc->fc.dev, "%s: start AT DMA status=%x\n",
 +					__func__, OREAD(sc, OHCI_DMACTL(off)));
  		OWRITE(sc, OHCI_DMACMD(off), dbch->top->bus_addr | fsegment);
  		OWRITE(sc, OHCI_DMACTL(off), OHCI_CNTL_DMA_RUN);
  		dbch->xferq.flag |= FWXFERQ_RUNNING;
 @@ -1848,7 +1850,7 @@
  		/* Disable bus reset interrupt until sid recv. */
  		OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_PHY_BUS_R);
  	
 -		device_printf(fc->dev, "BUS reset\n");
 +		device_printf(fc->dev, "%s: BUS reset\n", __func__);
  		OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
  		OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCSRC);
  
 @@ -1885,10 +1887,10 @@
  		plen = OREAD(sc, OHCI_SID_CNT);
  
  		fc->nodeid = node_id & 0x3f;
 -		device_printf(fc->dev, "node_id=0x%08x, gen=%d, ",
 -			node_id, (plen >> 16) & 0xff);
 +		device_printf(fc->dev, "%s: node_id=0x%08x, gen=%d\n",
 +			__func__, node_id, (plen >> 16) & 0xff);
  		if (!(node_id & OHCI_NODE_VALID)) {
 -			printf("Bus reset failure\n");
 +			device_printf(fc->dev, "%s: Bus reset failure\n", __func__);
  			goto sidout;
  		}
  
 @@ -1896,11 +1898,11 @@
  		sc->cycle_lost = 0;
  		OWRITE(sc, FWOHCI_INTMASK,  OHCI_INT_CYC_LOST);
  		if ((node_id & OHCI_NODE_ROOT) && !nocyclemaster) {
 -			printf("CYCLEMASTER mode\n");
 +			device_printf(fc->dev, "%s: CYCLEMASTER mode\n", __func__);
  			OWRITE(sc, OHCI_LNKCTL,
  				OHCI_CNTL_CYCMTR | OHCI_CNTL_CYCTIMER);
  		} else {
 -			printf("non CYCLEMASTER mode\n");
 +			device_printf(fc->dev, "%s: non CYCLEMASTER mode\n", __func__);
  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCMTR);
  			OWRITE(sc, OHCI_LNKCTL, OHCI_CNTL_CYCTIMER);
  		}
 @@ -1922,6 +1924,8 @@
  	u_int i;
  	struct firewire_comm *fc = (struct firewire_comm *)sc;
  
 +	FW_GLOCK(fc);
 +	FW_GUNLOCK(fc);
  	if (stat & OHCI_INT_DMA_IR) {
  		irstat = atomic_readandclear_int(&sc->irstat);
  		for(i = 0; i < fc->nisodma ; i++){
 @@ -1969,8 +1973,8 @@
  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCTIMER);
  #endif
  			OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
 -			device_printf(fc->dev, "too many cycle lost, "
 -			 "no cycle master presents?\n");
 +			device_printf(fc->dev, "%s: too many cycle lost, "
 +			 "no cycle master presents?\n", __func__);
  		}
  	}
  	if (stat & OHCI_INT_DMA_ATRQ) {
 @@ -1986,7 +1990,7 @@
  		device_printf(fc->dev, "unrecoverable error\n");
  	}
  	if (stat & OHCI_INT_PHY_INT) {
 -		device_printf(fc->dev, "phy int\n");
 +		device_printf(fc->dev, "%s: phy int\n", __func__);
  	}
  
  	return;
 @@ -1997,6 +2001,8 @@
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)arg;
  
 +	FW_GLOCK(&sc->fc);
 +	FW_GUNLOCK(&sc->fc);
  	fw_busreset(&sc->fc, FWBUSRESET);
  	OWRITE(sc, OHCI_CROMHDR, ntohl(sc->fc.config_rom[0]));
  	OWRITE(sc, OHCI_BUS_OPT, ntohl(sc->fc.config_rom[2]));
 @@ -2011,6 +2017,8 @@
  	int i, plen;
  
  
 +	FW_GLOCK(fc);
 +	FW_GUNLOCK(fc);
  	plen = OREAD(sc, OHCI_SID_CNT);
  
  	if (plen & OHCI_SID_ERR) {
 @@ -2062,7 +2070,16 @@
  {
  	uint32_t stat, irstat, itstat;
  
 +	/*
 +	 * Try and grab the global lock
 +	 * for a moment to keep us from
 +	 * firing while we are still 
 +	 * setting the interrupt condition
 +	 * on the bus
 +	 */
 +	FW_GLOCK(&sc->fc);
  	stat = OREAD(sc, FWOHCI_INTSTAT);
 +	FW_GUNLOCK(&sc->fc);
  	if (stat == 0xffffffff) {
  		device_printf(sc->fc.dev, 
  			"device physically ejected?\n");
 @@ -2091,25 +2108,15 @@
  	return (FILTER_HANDLED);
  }
  
 -int
 -fwohci_filt(void *arg)
 +void
 +fwohci_intr(void *arg)
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)arg;
  
 -	if (!(sc->intmask & OHCI_INT_EN)) {
 -		/* polling mode */
 -		return (FILTER_STRAY);
 -	}
 -	return (fwohci_check_stat(sc));
 +	fwohci_check_stat(sc);
  }
  
  void
 -fwohci_intr(void *arg)
 -{
 -	fwohci_filt(arg);
 -}
 -
 -void
  fwohci_poll(struct firewire_comm *fc, int quick, int count)
  {
  	struct fwohci_softc *sc = (struct fwohci_softc *)fc;
 @@ -2464,6 +2471,7 @@
  	struct fwohci_softc *sc;
  	uint32_t fun;
  
 +	FW_GLOCK(fc);
  	device_printf(fc->dev, "Initiate bus reset\n");
  	sc = (struct fwohci_softc *)fc;
  
 @@ -2487,6 +2495,7 @@
  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
  #endif
 +	FW_GUNLOCK(fc);
  }
  
  void
 Index: fwohcivar.h
 ===================================================================
 --- fwohcivar.h	(revision 186781)
 +++ fwohcivar.h	(working copy)
 @@ -37,12 +37,6 @@
  
  #include <sys/taskqueue.h>
  
 -#if defined(__DragonFly__) ||  __FreeBSD_version < 700043
 -#define FWOHCI_INTFILT	0
 -#else
 -#define FWOHCI_INTFILT	1
 -#endif
 -
  typedef struct fwohci_softc {
  	struct firewire_comm fc;
  	bus_space_tag_t bst;
 @@ -84,7 +78,6 @@
  } fwohci_softc_t;
  
  void fwohci_intr (void *arg);
 -int fwohci_filt (void *arg);
  int fwohci_init (struct fwohci_softc *, device_t);
  void fwohci_poll (struct firewire_comm *, int, int);
  void fwohci_reset (struct fwohci_softc *, device_t);
 
 --=-bdk1tSakYWl4EHnChn2C--
 

From: Dieter <freebsd@sopwith.solgatos.com>
To: bug-followup@FreeBSD.org, freebsd-firewire@FreeBSD.org
Cc:  
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing data to be lost 
Date: Mon, 05 Jan 2009 17:52:29 +0000

 In message <1231181797.21260.8.camel@localhost.localdomain>, Sean Bruno writes:
 
 > @@ -1388,6 +1387,7 @@
 >  	struct fw_device *fwdev;
 >  
 >  	s = splfw();
 > +	FW_GLOCK(fc);
 >  	fc->status = FWBUSEXPLORE;
 >  
 >  	/* Invalidate all devices, just after bus reset. */
 > @@ -1396,6 +1396,7 @@
 >  			fwdev->status = FWDEVINVAL;
 >  			fwdev->rcnt = 0;
 >  		}
 > +	FW_GUNLOCK(fc);
 >  	splx(s);
 >  
 >  	wakeup((void *)fc);
 
 If the (null these days) spl calls have been replaced with FW_GLOCK/FW_GUNLOCK
 calls, can the spl calls be removed now?
 
 > @@ -1922,6 +1924,8 @@
 >  	u_int i;
 >  	struct firewire_comm *fc = (struct firewire_comm *)sc;
 >  
 > +	FW_GLOCK(fc);
 > +	FW_GUNLOCK(fc);
 >  	if (stat & OHCI_INT_DMA_IR) {
 >  		irstat = atomic_readandclear_int(&sc->irstat);
 >  		for(i = 0; i < fc->nisodma ; i++){
 
 This needs to be reviewed by someone who understands what is protected by
 getting a lock and then immediately releasing it.  I have no clue.
 
 > @@ -1969,8 +1973,8 @@
 >  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCTIMER);
 >  #endif
 >  			OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
 > -			device_printf(fc->dev, "too many cycle lost, "
 > -			 "no cycle master presents?\n");
 > +			device_printf(fc->dev, "%s: too many cycle lost, "
 > +			 "no cycle master presents?\n", __func__);
 >  		}
 
 Should this be "too many cycles lost, " and  "no cycle master present?\n"
 
 >  	uint32_t fun;
 >  
 > +	FW_GLOCK(fc);
 >  	device_printf(fc->dev, "Initiate bus reset\n");
 >  	sc = (struct fwohci_softc *)fc;
 >  
 > @@ -2487,6 +2495,7 @@
 >  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
 >  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
 >  #endif
 > +	FW_GUNLOCK(fc);
 >  }
 
 Does the lock need to protect the printf?

From: Sean Bruno <sean.bruno@dsl-only.net>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: bug-followup@FreeBSD.org, freebsd-firewire@FreeBSD.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing
 data to be lost
Date: Mon, 05 Jan 2009 22:20:03 -0800

 On Mon, 2009-01-05 at 17:52 +0000, Dieter wrote:
 > In message <1231181797.21260.8.camel@localhost.localdomain>, Sean Bruno writes:
 > 
 > > @@ -1388,6 +1387,7 @@
 > >  	struct fw_device *fwdev;
 > >  
 > >  	s = splfw();
 > > +	FW_GLOCK(fc);
 > >  	fc->status = FWBUSEXPLORE;
 > >  
 > >  	/* Invalidate all devices, just after bus reset. */
 > > @@ -1396,6 +1396,7 @@
 > >  			fwdev->status = FWDEVINVAL;
 > >  			fwdev->rcnt = 0;
 > >  		}
 > > +	FW_GUNLOCK(fc);
 > >  	splx(s);
 > >  
 > >  	wakeup((void *)fc);
 > 
 > If the (null these days) spl calls have been replaced with FW_GLOCK/FW_GUNLOCK
 > calls, can the spl calls be removed now?
 > 
 They most definitely can be removed.  However, they are a roadmap to
 where locks SHOULD be.  I'm using them as guideposts through the code to
 see if I can figure out what's missing.
 
 > > @@ -1922,6 +1924,8 @@
 > >  	u_int i;
 > >  	struct firewire_comm *fc = (struct firewire_comm *)sc;
 > >  
 > > +	FW_GLOCK(fc);
 > > +	FW_GUNLOCK(fc);
 > >  	if (stat & OHCI_INT_DMA_IR) {
 > >  		irstat = atomic_readandclear_int(&sc->irstat);
 > >  		for(i = 0; i < fc->nisodma ; i++){
 > 
 > This needs to be reviewed by someone who understands what is protected by
 > getting a lock and then immediately releasing it.  I have no clue.
 > 
 Indeed, it's gross.  I only sent this patch over as a "test" of sorts to
 validate that the issue I'm looking into(locking in the firewire driver)
 is actually something related to what you reported in this issue.
 
 > > @@ -1969,8 +1973,8 @@
 > >  			OWRITE(sc, OHCI_LNKCTLCLR, OHCI_CNTL_CYCTIMER);
 > >  #endif
 > >  			OWRITE(sc, FWOHCI_INTMASKCLR,  OHCI_INT_CYC_LOST);
 > > -			device_printf(fc->dev, "too many cycle lost, "
 > > -			 "no cycle master presents?\n");
 > > +			device_printf(fc->dev, "%s: too many cycle lost, "
 > > +			 "no cycle master presents?\n", __func__);
 > >  		}
 > 
 > Should this be "too many cycles lost, " and  "no cycle master present?\n"
 > 
 Who knows.  I haven't reviewed this specific chunk of code to understand
 what it really means as per the FW specs.
 > >  	uint32_t fun;
 > >  
 > > +	FW_GLOCK(fc);
 > >  	device_printf(fc->dev, "Initiate bus reset\n");
 > >  	sc = (struct fwohci_softc *)fc;
 > >  
 > > @@ -2487,6 +2495,7 @@
 > >  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
 > >  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
 > >  #endif
 > > +	FW_GUNLOCK(fc);
 > >  }
 > 
 > Does the lock need to protect the printf?
 
 These are gross, overpowered, way to heavy handed locks that I'm playing
 with.  I need to prevent pre-emption of certain events while they are in
 progress.  One of these events is the firewire's assertion of "bus
 reset" on the firewire device.  I see the h/w interrupt firing before
 this code can actually complete, causing the driver to be confused on
 occasion.
 
 
 Thanks for surveying this code with me, it helps to see what other
 people's eyes can pick up.  I hope to have this in working order
 soon-ish.
 
 Sean
 

From: Dieter <freebsd@sopwith.solgatos.com>
To: bug-followup@FreeBSD.org, freebsd-firewire@FreeBSD.org
Cc:  
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing data to be lost 
Date: Tue, 06 Jan 2009 09:33:21 +0000

 In message <1231222803.21260.17.camel@localhost.localdomain>, Sean Bruno writes:
 
 > > >  	uint32_t fun;
 > > >  
 > > > +	FW_GLOCK(fc);
 > > >  	device_printf(fc->dev, "Initiate bus reset\n");
 > > >  	sc = (struct fwohci_softc *)fc;
 > > >  
 > > > @@ -2487,6 +2495,7 @@
 > > >  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
 > > >  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
 > > >  #endif
 > > > +	FW_GUNLOCK(fc);
 > > >  }
 > > 
 > > Does the lock need to protect the printf?
 > 
 > These are gross, overpowered, way to heavy handed locks that I'm playing
 > with.  I need to prevent pre-emption of certain events while they are in
 > progress.  One of these events is the firewire's assertion of "bus
 > reset" on the firewire device.  I see the h/w interrupt firing before
 > this code can actually complete, causing the driver to be confused on
 > occasion.
 
 I understand the basic concept of locking, or at least I did many many years
 ago when spls really were levels.  Since then they changed spls to not be levels
 (at least that's what the man page says) and later replaced with mutex.  I assume
 these changes are at least in part to better support SMP.
 
 What I don't understand is things like getting a lock and immediately releasing
 it, which appears to me to protect nothing.  Or why the printf needs to be inside
 the locked section of code.  I thought the goal was to hold a lock for as short a
 time as possible, and it is not clear to me why the printf needs to be protected.

From: Sean Bruno <sean.bruno@dsl-only.net>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: bug-followup@FreeBSD.org, freebsd-firewire@FreeBSD.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing
 data to be lost
Date: Sat, 10 Jan 2009 10:54:37 -0800

 On Tue, 2009-01-06 at 09:33 +0000, Dieter wrote:
 > In message <1231222803.21260.17.camel@localhost.localdomain>, Sean Bruno writes:
 > 
 > > > >  	uint32_t fun;
 > > > >  
 > > > > +	FW_GLOCK(fc);
 > > > >  	device_printf(fc->dev, "Initiate bus reset\n");
 > > > >  	sc = (struct fwohci_softc *)fc;
 > > > >  
 > > > > @@ -2487,6 +2495,7 @@
 > > > >  	fun |= FW_PHY_ISBR | FW_PHY_RHB;
 > > > >  	fun = fwphy_wrdata(sc, FW_PHY_ISBR_REG, fun);
 > > > >  #endif
 > > > > +	FW_GUNLOCK(fc);
 > > > >  }
 > > > 
 > > > Does the lock need to protect the printf?
 > > 
 > > These are gross, overpowered, way to heavy handed locks that I'm playing
 > > with.  I need to prevent pre-emption of certain events while they are in
 > > progress.  One of these events is the firewire's assertion of "bus
 > > reset" on the firewire device.  I see the h/w interrupt firing before
 > > this code can actually complete, causing the driver to be confused on
 > > occasion.
 > 
 > I understand the basic concept of locking, or at least I did many many years
 > ago when spls really were levels.  Since then they changed spls to not be levels
 > (at least that's what the man page says) and later replaced with mutex.  I assume
 > these changes are at least in part to better support SMP.
 > 
 > What I don't understand is things like getting a lock and immediately releasing
 > it, which appears to me to protect nothing.  Or why the printf needs to be inside
 > the locked section of code.  I thought the goal was to hold a lock for as short a
 > time as possible, and it is not clear to me why the printf needs to be protected.
 
 The printf DOES NOT need protection at all.  You are absolutely correct.
 
 The lock should be moved lower in the function for sure.  This is just
 my attempt to see if I am looking at the same problem you are reporting.
 Conceptually, I am trying to keep the h/w interrupt which fires during a
 bus reset from preempting this function and executing before this
 function is finished.
 
 The acquisition of the lock and release keeps the interrupt handler from
 executing before the setting of bus reset is finished.  I hope to fine
 tune the locks after I have confirmed operational goodness.
 
 Sean
 

From: Dieter <freebsd@sopwith.solgatos.com>
To: bug-followup@FreeBSD.org, freebsd-firewire@FreeBSD.org
Cc:  
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing data to be lost 
Date: Mon, 12 Jan 2009 15:04:29 +0000

 Sean,
 
 	I tried your patch, but bad news: a bus reset still
 tromps on Ethernet.
Responsible-Changed-From-To: freebsd-bugs->freebsd-firewire 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Mon Feb 2 21:11:18 UTC 2009 
Responsible-Changed-Why:  
Over to maintainer(s).  This looks like it may be some bad 
interaction between the firewire stack and using a serial 
console.  To submitter: It may be worth while switching to 
uart(4) rather than sio(4) for your serial ports and seeing 
if that makes any difference (as I don't think uart(4) uses 
the Giant lock). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118093 

From: Dieter <freebsd@sopwith.solgatos.com>
To: gavin@freebsd.org, bug-followup@freebsd.org
Cc: freebsd-bugs@freebsd.org, freebsd-firewire@freebsd.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing data to be lost 
Date: Wed, 04 Mar 2009 21:37:31 +0000

 In message <200902022113.n12LDfEI059994@freefall.freebsd.org>, gavin@freebsd.org writes:
 
 > This looks like it may be some bad
 > interaction between the firewire stack and using a serial
 > console.  To submitter: It may be worth while switching to
 > uart(4) rather than sio(4) for your serial ports and seeing
 > if that makes any difference (as I don't think uart(4) uses
 > the Giant lock).
 
 The box is running 7.1 now.  I commented out sio in the config file
 and built a new kernel.
 
 normal kernel:
 
  freebsd # dmesg | grep ^sio
 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 sio0: type 16550A, console
 sio0: [FILTER]
 sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
 sio1: type 16550A
 sio1: [FILTER]
  freebsd # dmesg | grep -i ^uart
  freebsd # dmesg | grep -i 1655 
 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 sio0: type 16550A, console
 sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
 sio1: type 16550A
  freebsd # dmesg | grep -i console
 sio0: type 16550A, console
 sc0: <System console> at flags 0x100 on isa0
 sc0: VGA <16 virtual consoles, flags=0x100>
  freebsd # dmesg | grep -i giant
 ohci0: [GIANT-LOCKED]
 ehci0: [GIANT-LOCKED]
 ohci1: [GIANT-LOCKED]
 ohci2: [GIANT-LOCKED]
 ehci1: [GIANT-LOCKED]
 ppc0: [GIANT-LOCKED]
 atkbd0: [GIANT-LOCKED]
 
 With sio commented out:
 
 OK boot kernel_uart
 /boot/kernel_uart/kernel text=0x3f8090 data=0x5b678+0x38cc8 syms=[0x8+0x6ee08+0x
 8+0x609a9]
 /
 
 And nothing else appears on the console.
 
 
  freebsd # dmesg | grep -i ^sio
  freebsd # dmesg | grep -i ^uart
 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 on acpi0
 uart0: [FILTER]
 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
 uart1: [FILTER]
  freebsd # dmesg | grep -i 1655
 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 on acpi0
 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
  freebsd # dmesg | grep -i console
 sc0: <System console> at flags 0x100 on isa0
 sc0: VGA <16 virtual consoles, flags=0x300>
 
 echo hello >> /dev/console
 doesn't show up.  Nothing shows up on console.
 
 And looking at files in /var/log I found:
 
 getty[789]: open /dev/ttyd0: No such file or directory
 getty[790]: open /dev/ttyd1: No such file or directory
 
 
 Looks to me like commenting out sio doesn't work so well on my box.
 Is that the wrong way to switch from sio to uart?
 Is there something else I need to change instead, or in addition?

From: Gavin Atkinson <gavin@FreeBSD.org>
To: Dieter <freebsd@sopwith.solgatos.com>
Cc: bug-followup@FreeBSD.org, freebsd-bugs@FreeBSD.org,
        freebsd-firewire@FreeBSD.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing
	data to be lost
Date: Fri, 06 Mar 2009 15:53:43 +0000

 On Wed, 2009-03-04 at 21:37 +0000, Dieter wrote:
 > In message <200902022113.n12LDfEI059994@freefall.freebsd.org>, gavin@freebsd.org writes:
 > 
 > > This looks like it may be some bad
 > > interaction between the firewire stack and using a serial
 > > console.  To submitter: It may be worth while switching to
 > > uart(4) rather than sio(4) for your serial ports and seeing
 > > if that makes any difference (as I don't think uart(4) uses
 > > the Giant lock).
 > 
 > The box is running 7.1 now.  I commented out sio in the config file
 > and built a new kernel.
 > 
 > normal kernel:
 > 
 >  freebsd # dmesg | grep ^sio
 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 > sio0: type 16550A, console
 > sio0: [FILTER]
 > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
 > sio1: type 16550A
 > sio1: [FILTER]
 >  freebsd # dmesg | grep -i ^uart
 >  freebsd # dmesg | grep -i 1655 
 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
 > sio0: type 16550A, console
 > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
 > sio1: type 16550A
 >  freebsd # dmesg | grep -i console
 > sio0: type 16550A, console
 > sc0: <System console> at flags 0x100 on isa0
 > sc0: VGA <16 virtual consoles, flags=0x100>
 >  freebsd # dmesg | grep -i giant
 > ohci0: [GIANT-LOCKED]
 > ehci0: [GIANT-LOCKED]
 > ohci1: [GIANT-LOCKED]
 > ohci2: [GIANT-LOCKED]
 > ehci1: [GIANT-LOCKED]
 > ppc0: [GIANT-LOCKED]
 > atkbd0: [GIANT-LOCKED]
 > 
 > With sio commented out:
 > 
 > OK boot kernel_uart
 > /boot/kernel_uart/kernel text=0x3f8090 data=0x5b678+0x38cc8 syms=[0x8+0x6ee08+0x
 > 8+0x609a9]
 > /
 > 
 > And nothing else appears on the console.
 > 
 > 
 >  freebsd # dmesg | grep -i ^sio
 >  freebsd # dmesg | grep -i ^uart
 > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 on acpi0
 > uart0: [FILTER]
 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
 > uart1: [FILTER]
 >  freebsd # dmesg | grep -i 1655
 > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 on acpi0
 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
 >  freebsd # dmesg | grep -i console
 > sc0: <System console> at flags 0x100 on isa0
 > sc0: VGA <16 virtual consoles, flags=0x300>
 > 
 > echo hello >> /dev/console
 > doesn't show up.  Nothing shows up on console.
 > 
 > And looking at files in /var/log I found:
 > 
 > getty[789]: open /dev/ttyd0: No such file or directory
 > getty[790]: open /dev/ttyd1: No such file or directory
 > 
 > 
 > Looks to me like commenting out sio doesn't work so well on my box.
 > Is that the wrong way to switch from sio to uart?
 > Is there something else I need to change instead, or in addition?
 
 You'll also need to add the uart hints to your device.hints file, if you
 haven't already, and update /etc/ttys.
 
 hint.uart.0.at="isa"
 hint.uart.0.port="0x3F8"
 hint.uart.0.flags="0x10"
 hint.uart.0.irq="4"
 
 /etc/ttys: you may find you need to change "ttyd0" to "ttyu0".
 
 Gavin

From: Dieter <freebsd@sopwith.solgatos.com>
To: Gavin Atkinson <gavin@freebsd.org>
Cc: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Subject: Re: kern/118093: [firewire] firewire bus reset hogs CPU, causing data to be lost 
Date: Sun, 10 May 2009 11:20:29 +0100

 >  > > This looks like it may be some bad
 >  > > interaction between the firewire stack and using a serial
 >  > > console.  To submitter: It may be worth while switching to
 >  > > uart(4) rather than sio(4) for your serial ports and seeing
 >  > > if that makes any difference (as I don't think uart(4) uses
 >  > > the Giant lock).
 
 >  > Looks to me like commenting out sio doesn't work so well on my box.
 >  > Is that the wrong way to switch from sio to uart?
 >  > Is there something else I need to change instead, or in addition?
 >  
 >  You'll also need to add the uart hints to your device.hints file, if you
 >  haven't already, and update /etc/ttys.
 >  
 >  hint.uart.0.at="isa"
 >  hint.uart.0.port="0x3F8"
 >  hint.uart.0.flags="0x10"
 >  hint.uart.0.irq="4"
 >  
 >  /etc/ttys: you may find you need to change "ttyd0" to "ttyu0".
 >  
 >  Gavin
 
 Thanks, with those changes it works with uart now.
 
 The bad news is that switching from sio to uart didn't fix the
 CPU hogging problem.

From: Dieter <freebsd@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
Cc:  
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost 
Date: Tue, 26 Jan 2010 10:56:30 PST

 > Looks like the kernel printf takes a *lot* of CPU, and
 > somehow locks out Ethernet.
 
 Looking through man pages working on an unrelated problem, I found:
 
 man timed says:
 
      Messages printed by the kernel on the system console occur with inter-
      rupts disabled.  This means that the clock stops while they are printing.
      A machine with many disk or network hardware problems and consequent mes-
      sages cannot keep good time by itself.  Each message typically causes the
      clock to lose a dozen milliseconds.
 
 If a kernel printf disables Ethernet interrupts for 12 milliseconds,
 that would create the problem.

From: dieterbsd@engineer.com
To: bug-followup@FreeBSD.org
Cc: freebsd-firewire@freebsd.org, freebsd-usb@freebsd.org,
        freebsd-drivers@freebsd.org, freebsd-net@freebsd.org,
        freebsd@sopwith.solgatos.com
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost
Date: Thu, 03 Feb 2011 16:53:32 -0500

 New data: USB causes the same problem that firewire does.
 
 FreeBSD 8.0
 amd64
 nfe0: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port 0xb400-0xb407=20
 mem 0xfebf9000-0xfebf9fff irq 23 at device 10.0 on pci0
 bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev.=20
 0x4101> mem 0xfe4f0000-0xfe4fffff irq 19 at device 0.0 on pci5
 fwohci1: <NEC uPD72871/2> mem 0xfdeff000-0xfdefffff irq 19 at device=20
 8.0 on pci2
 ehci0: <NVIDIA nForce4 USB 2.0 controller> mem 0xfebfe000-0xfebfe0ff=20
 irq 22 at device 2.1 on pci0
 
 Plugging in a USB_to_RS-232 bridge generates:
 
 ugen0.2: <Prolific Technology Inc.> at usbus0
 uplcom0: <Prolific Technology Inc. USB-Serial Controller D, class 0/0,=20
 rev 1.10/
 4.00, addr 2> on usbus0
 
 And networking is locked out for too long, and data is lost.  (The node=20
 sending data
 via Ethernet has too small a transmit buffer, but is a closed source=20
 closed hardware
 black box and cannot be fixed.  Once data is lost it is lost forever,=20
 it cannot be
 recreated, so this is very bad.)  The FreeBSD box receiving data has a=20
 very large
 receive buffer, but if another device driver locks out networking it=20
 doesn't help.
 
 Changing the printf(9)s to log(9)s fixes the problem, but this is not a=20
 good workaround,
 there are LOTS of printfs in the kernel, I keep hitting new ones. =20
 Printf(9) alone isn't
 the problem, adding printfs to chown(2) does not cause the problem, but=20
 printfs from
 device drivers do. (usb, firewire, ...)
 
 My theory is there is a locking conflict.
 
 Here is lock profiling data for USB_RS-232 bridge being plugged in=20
 printfs locking out networking:
 
 debug.lock.prof.enable: 0
 debug.lock.prof.reset: 0
 debug.lock.prof.stats:
       max  wait_max       total  wait_total       count    avg wait_avg=20
 cnt_hold cnt_lock name
    251240         0      499982           0         303   1650      0  0=
 =20
       0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:0123456789ABCDEF -=20
 USB device SX lock)
    104255         0      110429           0           2  55214      0  0=
 =20
       0 /usr/src/sys/dev/usb/usb_device.c:2459 (sx:123456789ABCDEF - USB=
 =20
 config SX lock)
 
    104217         0    29670949           0       47087    630      0  0=
 =20
       0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  =20
 <---------<<<<
 
    101962         0      103089           0          27   3818      0  0=
 =20
      0 /usr/src/sys/kern/kern_condvar.c:145 (sleep mutex:Giant)
     51071         0       81088           0           3  27029      0  0=
 =20
      0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx)
     23049         0       31351           0           2  15675      0  0=
 =20
      0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs)
     12408         0      978544           0         971   1007      0  0=
 =20
      0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs)
     11466         0      744372           0       11567     64      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait)
     11141         0      501155           0        3850    130      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait)
      6115         0       61720           0          26   2373      0  0=
 =20
       0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:123456789ABCDEF - USB=
 =20
 device SX lock)
      5551         0       16749           0          19    881      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer)
      3043         0        3749           0           6    624      0  0=
 =20
       0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1321 (sleep mutex:struct mount=
 =20
 mtx)
      2333         0        9901           0          36    275      0  0=
 =20
      0 /usr/src/sys/kern/vfs_mount.c:2231 (sleep mutex:struct mount mtx)
      2055         0        9462           0         100     94      0  0=
 =20
       0 /usr/src/sys/kern/subr_hints.c:117 (sleep mutex:kernel=20
 environment)
      1649         0       79272           0        3846     20      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:2963 (sleep mutex:vm object)
      1636         0       36196           0       15515      2      0  0=
 =20
       0 /usr/src/sys/vm/vm_page.c:1052 (sleep mutex:vm page queue free=20
 mutex)
      1205         0       55601           0       15411      3      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:2545 (sleep mutex:bufobj interlock)
      1076         0        1076           0           1   1076      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:ufs)
      1069         0       31117           0          30   1037      0  0=
 =20
      0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:nfe0)
       848         0        3752           0        1359      2      0  0=
 =20
       0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1297 (sleep mutex:vnode=20
 interlock)
       754         0       10016           0        3846      2      0  0=
 =20
       0 /usr/src/sys/ufs/ffs/ffs_softdep.c:4274 (sleep mutex:Softdep=20
 Lock)
       739         0        2463           0           4    615      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:devfs)
       540         0        3370           0        1365      2      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:3179 (sleep mutex:vnode interlock)
       525         0       80852           0         160    505      0  0=
 =20
      0 /usr/src/sys/dev/uart/uart_cpu.h:92 (spin mutex:uart_hwmtx)
       512         0        2065           0          23     89      0  0=
 =20
      0 /usr/src/sys/kern/vfs_default.c:599 (lockmgr:bufwait)
       487         0       97058           0         481    201      0  0=
 =20
      0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait)
       467         0        2069           0          12    172      0  0=
 =20
      0 /usr/src/sys/dev/usb/usb_busdma.c:547 (sleep mutex:uplcom)
       455         0        2978           0          18    165      0  0=
 =20
      0 /usr/src/sys/dev/usb/usb_transfer.c:286 (sleep mutex:uplcom)
       423         0         727           0           2    363      0  0=
 =20
      0 /usr/src/sys/vm/uma_core.c:1565 (sleep mutex:UMA lock)
       367         0      230166           0       47087      4      0  0=
 =20
      0 /usr/src/sys/netinet/tcp_usrreq.c:708 (rw:tcpinp)
       358         0        1387           0          11    126      0  0=
 =20
      0 /usr/src/sys/kern/kern_synch.c:241 (sleep mutex:Giant)
       339         0       72085           0       46119      1      0  0=
 =20
      0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:so_rcv)
       332         0      225434           0       93206      2      0  0=
 =20
      0 /usr/src/sys/kern/uipc_socket.c:1441 (sleep mutex:so_rcv)
       332         0       19529           0       11551      1      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1927 (sleep mutex:bufobj interlock)
       331         0      162523           0       94887      1      0  0=
 =20
      0 /usr/src/sys/netinet/tcp_output.c:280 (sleep mutex:so_snd)
       330         0       73512           0       47087      1      0  0=
 =20
      0 /usr/src/sys/kern/uipc_socket.c:1843 (sleep mutex:so_rcv)
       329         0       88069           0       47771      1      0  0=
 =20
      0 /usr/src/sys/kern/uipc_socket.c:1687 (sleep mutex:so_rcv)
       329         0         348           0          12     29      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1955 (sleep mutex:Syncer mtx)
       320         0       66185           0        3353     19      0  0=
 =20
      0 /usr/src/sys/kern/vfs_cluster.c:868 (lockmgr:bufwait)
       303         0        7410           0        4360      1      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:3944 (sleep mutex:bufobj interlock)
 debug.lock.prof.rejected: 0
 debug.lock.prof.skipcount: 0
 debug.lock.prof.skipspin: 0
 
 Here is lock profiling data for firewire bus reset printfs locking out=20
 Ethernet:
 
       max  wait_max       total  wait_total       count    avg wait_avg=20
 cnt_hold cnt_lock name
    373345         0      397495           0         643    618      0  0=
 =20
      0 /usr/src/sys/kern/vfs_lookup.c:497 (lockmgr:ufs)
    373325         0      390815           0        3140    124      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:2218 (sleep mutex:vnode interlock)
 
    359514         0    32199808           0       45582    706      0  0=
 =20
       0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  =20
 <---------<<<<
 
     63827       320      746196         740       70067     10      0  0=
 =20
      4 /usr/src/sys/dev/firewire/fwohci.c:2107 (sleep mutex:firewire)
     49927         0      165929           0           8  20741      0  0=
 =20
      0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs)
     33358         0     1707461           0        1106   1543      0  0=
 =20
      0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs)
     30294         0      401224           0        2641    151      0  0=
 =20
      0 /usr/src/sys/vm/vm_map.c:3526 (sx:user map)
     28432         0      934681           0          74  12630      0  0=
 =20
      0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx)
     28296         0       66892           0        2641     25      0  0=
 =20
      0 /usr/src/sys/vm/vm_fault.c:297 (sleep mutex:vm object)
     25020         0       26667           0          45    592      0  0=
 =20
      0 /usr/src/sys/kern/kern_exit.c:825 (sx:proctree)
     19027         0     1225496           0       11937    102      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait)
     18887         0      906297           0        4140    218      0  0=
 =20
      0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait)
     12101         0       23459           0        1675     14      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:2342 (sleep mutex:vnode interlock)
      7448         0      170841           0        2229     76      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:2083 (lockmgr:ufs)
      7342         0       21792           0        2169     10      0  0=
 =20
      0 /usr/src/sys/kern/vfs_cache.c:390 (rw:Name Cache)
      4648         0       10831           0          11    984      0  0=
 =20
      0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer)
      3536         0       35352           0        9218      3      0  0=
 =20
      0 /usr/src/sys/amd64/amd64/pmap.c:3989 (sleep mutex:pmap)
      3039         0       34283           0        2641     12      0  0=
 =20
      0 /usr/src/sys/vm/vm_fault.c:937 (sleep mutex:vm object)
      2825         0       53949           0        3495     15      0  0=
 =20
      0 /usr/src/sys/vm/vm_fault.c:1010 (sleep mutex:vm object)
      2286         0        7169           0         547     13      0  0=
 =20
      0 /usr/src/sys/kern/kern_sig.c:983 (sleep mutex:process lock)
      2083      4010        9037        5380        2143      4      2  0=
 =20
      2 /usr/src/sys/kern/vfs_subr.c:2118 (sleep mutex:vnode interlock)
      2083        57        2089          57           4    522     14  0=
 =20
      1 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:pipe mutex)
      2022         0      181449           0         515    352      0  0=
 =20
      0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait)
      1997         0        3397           0           2   1698      0  0=
 =20
      0 /usr/src/sys/kern/kern_sysctl.c:1513 (sx:sysctl mem)
      1994         0        4890           0         117     41      0  0=
 =20
      0 /usr/src/sys/kern/kern_sysctl.c:1521 (sx:sysctl lock)
      1971         0        4156           0          13    319      0  0=
 =20
      0 /usr/src/sys/kern/kern_sysctl.c:1417 (sleep mutex:Giant)
      1965         0       31838           0        3471      9      0  0=
 =20
      0 /usr/src/sys/vm/vm_object.c:482 (sleep mutex:vm object)
      1926         0       17695           0          83    213      0  0=
 =20
      0 /usr/src/sys/vm/vm_object.c:541 (sleep mutex:vm object)
 
 
 There is at least one line in common that looks networking related,=20
 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)
 
 int
 sblock(struct sockbuf *sb, int flags)
 {
 
         KASSERT((flags & SBL_VALID) =3D=3D flags,
             ("sblock: flags invalid (0x%x)", flags));
 
         if (flags & SBL_WAIT) {
                 if ((sb->sb_flags & SB_NOINTR) ||
                     (flags & SBL_NOINTR)) {
                         sx_xlock(&sb->sb_sx);
                         return (0);
                 }
                 return (sx_xlock_sig(&sb->sb_sx));
         } else {
                 if (sx_try_xlock(&sb->sb_sx) =3D=3D 0)
                         return (EWOULDBLOCK);
                 return (0);
         }
 }
 
 It is not clear to me what this code is doing. (comments might help)
 It is not clear to me why networking and a USB_to_RS-232 bridge would
 have a lock conflict. It is not clear to me why networking and firewire
 would have a lock conflict.
 
 How do we fix this?
 
 
>Unformatted:
