From nobody@FreeBSD.org  Wed Jul 26 09:28:17 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B1B6016A4E0
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 26 Jul 2006 09:28:17 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F0EC943D46
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 26 Jul 2006 09:28:10 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k6Q9SA10091580
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 26 Jul 2006 09:28:10 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k6Q9SAtx091573;
	Wed, 26 Jul 2006 09:28:10 GMT
	(envelope-from nobody)
Message-Id: <200607260928.k6Q9SAtx091573@www.freebsd.org>
Date: Wed, 26 Jul 2006 09:28:10 GMT
From: Roar Throns <roart@nvg.ntnu.no>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [bce] Broadcom bce driver and SMP hangup
X-Send-Pr-Version: www-2.3

>Number:         100858
>Category:       kern
>Synopsis:       [bce] Broadcom bce driver and SMP hangup
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-net
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jul 26 09:30:12 GMT 2006
>Closed-Date:    Wed Jul 03 00:52:41 UTC 2013
>Last-Modified:  Wed Jul 03 00:52:41 UTC 2013
>Originator:     Roar Throns
>Release:        FreeBSD 6.1/amd64
>Organization:
>Environment:
FreeBSD d 6.1-STABLE-200607 FreeBSD 6.1-STABLE-200607 #6: Wed Jul 26 08:21:18 UTC 2006     root@d:/usr/obj/usr/src/sys/A  amd64
>Description:
Using Dell Poweredge 1950 with 2 dual-core Xeon (Id = 0xf64)
It has a mpt0: <LSILogic SAS Adapter>, and it probes with 6.1-STABLE.
It has also got 2 Broadcom NetXtreme II BCM5708C 1000Base-T (B1)
Bus: bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz
I am using the 0.9.6 version with the 1.6 2006/07/20 revision of the driver.
(Have also used the 0.9.5 driver with 1.2.2.2 revision)

If using a kernel with built-in bce, it locks up when booting. (Both versions.)

If using a kernel with no built-in bce, it locks up for some while
after loading the module. (Only tested with 0.9.6.)
Then seems to work normally.
When rebooting, with 2 bces it locks up after printing system uptime.

Problem is not appearing when booted with no APIC (and no SMP).

>How-To-Repeat:
Use bce with SMP.
>Fix:

>Release-Note:
>Audit-Trail:

From: "Dmitry Petrov" <DPetrov@nchcapital.com>
To: bug-followup@FreeBSD.org, roart@nvg.ntnu.no
Cc:  
Subject: Re: kern/100858: [bce] Broadcom bce driver and SMP hangup
Date: Wed, 16 Aug 2006 10:15:03 -0400

 I have a similar problem with bce driver on FreeBSD 6.1-stable. I tried it with and without 
 SMP with the same result. The machine does not freeze, only the network interface goes 
 up/down which makes the whole system unusable:
 
 Aug 16 09:24:23 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:24:26 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:26:57 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:26:57 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:27:00 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:28:13 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:28:13 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:28:16 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:30:13 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:30:13 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:30:16 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:32:06 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:32:06 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:32:09 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:34:28 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:34:28 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:34:31 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:36:05 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Aug 16 09:36:05 nch1 kernel: bce0: link state changed to DOWN
 Aug 16 09:36:08 nch1 kernel: bce0: link state changed to UP
 Aug 16 09:37:18 nch1 kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 
 It usually works for some time (hours)  after reboot before it starts happening.
 
 Coincidentally, the machine I have is also Dell PowerEdge 1950 with dual Broadcom 
 NetXtreme II BCM5708C 1000Base-T interfaces (bot only one is configured currently).
 
 I saw a similar error mentioned elsewhere in other lists. E.g. check freebsd-current:
 
 http://lists.freebsd.org/pipermail/freebsd-current/2006-August/065001.html
 
 Another things I noticed (which may or may not be related, but is worth mentioning as it 
 involes ACPI) is that sometimes the machine does not reboot on "reboot" command, instead 
 it seems to be staying in halted state.
 
 
 
 
 
 
 
 -- 
 Dmitry Petrov
 phone: (212) 641-3235
 
 
 
 -- 
 
 *** DISCLAIMER ***
  
 This message is intended only for the use of the individual 
 or entity to which it is addressed and may contain information
 that is privileged, confidential and exempt from disclosure 
 under applicable law.  If the reader of this message is not 
 the intended recipient or the employee or agent responsible 
 for delivering the message to the intended recipient, you are 
 hereby notified that any dissemination, distribution or copying 
 of this communication is strictly prohibited.  If you have received
 this message in error, please notify the sender by reply
 transmission and delete the message. The contents of this message
 that do not relate to the official business of our company shall
 be understood as neither given nor endorsed by it.
 
 ------------------
 
Responsible-Changed-From-To: freebsd-bugs->davidch 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Sun Sep 3 13:29:51 UTC 2006 
Responsible-Changed-Why:  
Assign to driver maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100858 

From: "Mahlon E. Smith" <mahlon@martini.nu>
To: bug-followup@FreeBSD.org, roart@nvg.ntnu.no
Cc:  
Subject: Re: kern/100858: [bce] Broadcom bce driver and SMP hangup
Date: Sun, 10 Sep 2006 18:00:03 -0700

 I'm seeing this too on a new batch of 1950s.
 
 I am -NOT- running an SMP kernel, just upgraded to -STABLE last Friday
 (the 8th) and still on GENERIC.
 
 The network is stable enough if not accessing NFS.  Anything over a meg
 of traffic over an NFS mount kills it.  Ouch.
 
 Sep 10 19:41:41  kernel: nfs server ...:/usr/local/data/freebsd/ports: not responding
 Sep 10 19:41:41  kernel: nfs server ...: not responding
 Sep 10 19:41:46  kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 Sep 10 19:41:46  kernel: bce0: link state changed to DOWN
 Sep 10 19:41:48  kernel: bce0: link state changed to UP
 Sep 10 19:42:37  ntpd[479]: sendto(10.1.1.5): Network is unreachable
 Sep 10 19:42:43  kernel: nfs server ...: not responding
 Sep 10 19:43:14  last message repeated 2 times
 Sep 10 19:43:14  kernel: nfs server ...: not responding
 Sep 10 19:43:37  kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting!
 
 ... etc.
 
 --
 Mahlon E. Smith  
 mahlon@martini.nu | http://www.martini.nu/

From: Mark Blackman <mark@exonetric.com>
To: bug-followup@FreeBSD.org, roart@nvg.ntnu.no
Cc:  
Subject: Re: kern/100858: [bce] Broadcom bce driver and SMP hangup
Date: Mon, 9 Oct 2006 15:05:47 +0100

 Hi,
 
 in the FWIW dept., I'm seeing precisely the same issue as outlined in  
 the other reports
 for this PR, Dell Poweredge 1950.
 
 bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred,  
 resetting!
 bce0: link state changed to DOWN
 bce0: link state changed to UP
 
 
 I can trigger it by running bonnie on an NFS client against the  
 exported filesytem, but
 only if it's mounted as UDP transport. The bonnie tests succeed with  
 no issues if
 a TCP transport is used. So as a workaround, re-mount with TCP  
 transports perhaps.
 
 i.e. bonnie -u nobody -f
 
 Cheers,
 Mark Blackman
 Exonetric
 

From: Travis Pugh <tdp@eng.mstarmetro.net>
To: bug-followup@FreeBSD.org,  roart@nvg.ntnu.no
Cc:  
Subject: Re: kern/100858: [bce] Broadcom bce driver and SMP hangup
Date: Fri, 27 Oct 2006 16:04:45 -0600

 I am also experiencing the same behavior on STABLE, with a Dell 
 PowerEdge 1950.
 
 With the NICs enabled, and APIC enabled, the boot process stops at:
 
 Linux ELF exec handler installed
 lo0: bpf attached
 rr232x: no controller detected.
 
 With the NICs disabled in BIOS, the boot process goes normally:
 
 Linux ELF exec handler installed
 lo0: bpf attached
 rr232x: no controller detected.
 mpt0: mpt_cam_event: 0x16
 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
 ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire
 acd0: setting PIO4 on 63XXESB2 chip
 acd0: setting UDMA33 on 63XXESB2 chip
 acd0: <TEAC CD-ROM CD-224E-N/3.AB> CDROM drive at ata0 as master
 acd0: read 4134KB/s (4134KB/s), 256KB buffer, UDMA33
 acd0: Reads: CDR, CDRW, CDDA stream, packet
 acd0: Writes:
 acd0: Audio: play, 256 volume levels
 acd0: Mechanism: ejectable tray, unlocked
 acd0: Medium: no/blank disc
 (probe0:mpt0:0:0:0): Retrying Command
 (probe0:mpt0:0:0:0): error 22
 (probe0:mpt0:0:0:0): Unretryable Error
 (probe0:mpt0:0:0:1): error 22
 (probe0:mpt0:0:0:1): Unretryable Error
 (probe0:mpt0:0:0:2): error 22
 (probe0:mpt0:0:0:2): Unretryable Error
 (probe0:mpt0:0:0:3): error 22
 (probe0:mpt0:0:0:3): Unretryable Error
 (probe0:mpt0:0:0:4): error 22
 (probe0:mpt0:0:0:4): Unretryable Error
 (probe0:mpt0:0:0:5): error 22
 (probe0:mpt0:0:0:5): Unretryable Error
 (probe0:mpt0:0:0:6): error 22
 (probe0:mpt0:0:0:6): Unretryable Error
 (probe0:mpt0:0:0:7): error 22
 (probe0:mpt0:0:0:7): Unretryable Error
 (probe8:mpt0:0:8:0): Retrying Command
 (probe8:mpt0:0:8:0): error 22
 (probe8:mpt0:0:8:0): Unretryable Error
 (probe8:mpt0:0:8:0): error 22
 (probe8:mpt0:0:8:0): Unretryable Error
 (probe0:mpt0:0:8:1): Unexpected Bus Free
 (probe0:mpt0:0:8:1): Retrying Command
 (probe0:mpt0:0:8:1): Unexpected Bus Free
 (probe0:mpt0:0:8:1): Retrying Command
 (probe0:mpt0:0:8:1): Unexpected Bus Free
 (probe0:mpt0:0:8:1): Retrying Command
 (probe0:mpt0:0:8:1): Unexpected Bus Free
 (probe0:mpt0:0:8:1): Retrying Command
 (probe0:mpt0:0:8:1): Unexpected Bus Free
 (probe0:mpt0:0:8:1): error 5
 (probe0:mpt0:0:8:1): Retries Exausted
 (probe0:mpt0:0:8:2): Unexpected Bus Free
 (probe0:mpt0:0:8:2): Retrying Command
 (probe0:mpt0:0:8:2): Unexpected Bus Free
 (probe0:mpt0:0:8:2): Retrying Command
 (probe0:mpt0:0:8:2): Unexpected Bus Free
 (probe0:mpt0:0:8:2): Retrying Command
 (probe0:mpt0:0:8:2): Unexpected Bus Free
 (probe0:mpt0:0:8:2): Retrying Command
 (probe0:mpt0:0:8:2): Unexpected Bus Free
 (probe0:mpt0:0:8:2): error 5
 (probe0:mpt0:0:8:2): Retries Exausted
 (probe0:mpt0:0:8:3): Unexpected Bus Free
 (probe0:mpt0:0:8:3): Retrying Command
 (probe0:mpt0:0:8:3): Unexpected Bus Free
 (probe0:mpt0:0:8:3): Retrying Command
 (probe0:mpt0:0:8:3): Unexpected Bus Free
 (probe0:mpt0:0:8:3): Retrying Command
 (probe0:mpt0:0:8:3): Unexpected Bus Free
 (probe0:mpt0:0:8:3): Retrying Command
 (probe0:mpt0:0:8:3): Unexpected Bus Free
 (probe0:mpt0:0:8:3): error 5
 (probe0:mpt0:0:8:3): Retries Exausted
 (probe0:mpt0:0:8:4): Unexpected Bus Free
 (probe0:mpt0:0:8:4): Retrying Command
 (probe0:mpt0:0:8:4): Unexpected Bus Free
 (probe0:mpt0:0:8:4): Retrying Command
 (probe0:mpt0:0:8:4): Unexpected Bus Free
 (probe0:mpt0:0:8:4): Retrying Command
 (probe0:mpt0:0:8:4): Unexpected Bus Free
 (probe0:mpt0:0:8:4): Retrying Command
 (probe0:mpt0:0:8:4): Unexpected Bus Free
 (probe0:mpt0:0:8:4): error 5
 (probe0:mpt0:0:8:4): Retries Exausted
 (probe0:mpt0:0:8:5): Unexpected Bus Free
 (probe0:mpt0:0:8:5): Retrying Command
 (probe0:mpt0:0:8:5): Unexpected Bus Free
 (probe0:mpt0:0:8:5): Retrying Command
 (probe0:mpt0:0:8:5): Unexpected Bus Free
 (probe0:mpt0:0:8:5): Retrying Command
 (probe0:mpt0:0:8:5): Unexpected Bus Free
 (probe0:mpt0:0:8:5): Retrying Command
 (probe0:mpt0:0:8:5): Unexpected Bus Free
 (probe0:mpt0:0:8:5): error 5
 (probe0:mpt0:0:8:5): Retries Exausted
 (probe0:mpt0:0:8:6): Unexpected Bus Free
 (probe0:mpt0:0:8:6): Retrying Command
 (probe0:mpt0:0:8:6): Unexpected Bus Free
 (probe0:mpt0:0:8:6): Retrying Command
 (probe0:mpt0:0:8:6): Unexpected Bus Free
 (probe0:mpt0:0:8:6): Retrying Command
 (probe0:mpt0:0:8:6): Unexpected Bus Free
 (probe0:mpt0:0:8:6): Retrying Command
 (probe0:mpt0:0:8:6): Unexpected Bus Free
 (probe0:mpt0:0:8:6): error 5
 (probe0:mpt0:0:8:6): Retries Exausted
 (probe0:mpt0:0:8:7): Unexpected Bus Free
 (probe0:mpt0:0:8:7): Retrying Command
 (probe0:mpt0:0:8:7): Unexpected Bus Free
 (probe0:mpt0:0:8:7): Retrying Command
 (probe0:mpt0:0:8:7): Unexpected Bus Free
 (probe0:mpt0:0:8:7): Retrying Command
 (probe0:mpt0:0:8:7): Unexpected Bus Free
 (probe0:mpt0:0:8:7): Retrying Command
 (probe0:mpt0:0:8:7): Unexpected Bus Free
 (probe0:mpt0:0:8:7): error 5
 (probe0:mpt0:0:8:7): Retries Exausted
 pass0 at mpt0 bus 0 target 0 lun 0
 pass0: <ATA WDC WD1600JS-75N 2E04> Fixed Direct Access SCSI-5 device
 pass0: Serial Number      WD-WCANM5937301
 pass0: 300.000MB/s transfers, Tagged Queueing Enabled
 pass1 at mpt0 bus 0 target 8 lun 0
 pass1: <DP BACKPLANE 1.00> Fixed Enclosure Services SCSI-5 device
 pass1: 300.000MB/s transfers, Tagged Queueing Enabled
 ses0 at mpt0 bus 0 target 8 lun 0
 ses0: <DP BACKPLANE 1.00> Fixed Enclosure Services SCSI-5 device
 ses0: 300.000MB/s transfers, Tagged Queueing Enabled
 ses0: SCSI-3 SES Device
 GEOM: new disk da0
 ATA PseudoRAID loaded
 SMP: AP CPU #1 Launched!
 cpu1 AP:
       ID: 0x02000000   VER: 0x00050014 LDR: 0x00000000 DFR: 0xffffffff
    lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
    timer: 0x000200ef therm: 0x00010000 err: 0x00010000 pcm: 0x00010000
 SMP: AP CPU #3 Launched!
 cpu3 AP:
       ID: 0x06000000   VER: 0x00050014 LDR: 0x00000000 DFR: 0xffffffff
    lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
    timer: 0x000200ef therm: 0x00010000 err: 0x00010000 pcm: 0x00010000
 SMP: AP CPU #2 Launched!
 cpu2 AP:
       ID: 0x04000000   VER: 0x00050014 LDR: 0x00000000 DFR: 0xffffffff
    lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
    timer: 0x000200ef therm: 0x00010000 err: 0x00010000 pcm: 0x00010000
 INTR: Assigning IRQ 3 to local APIC 0
 ioapic0: Assigning ISA IRQ 3 to local APIC 0
 INTR: Assigning IRQ 4 to local APIC 2
 ioapic0: Assigning ISA IRQ 4 to local APIC 2
 INTR: Assigning IRQ 9 to local APIC 4
 ioapic0: Assigning ISA IRQ 9 to local APIC 4
 INTR: Assigning IRQ 14 to local APIC 6
 ioapic0: Assigning ISA IRQ 14 to local APIC 6
 INTR: Assigning IRQ 15 to local APIC 0
 ioapic0: Assigning ISA IRQ 15 to local APIC 0
 INTR: Assigning IRQ 20 to local APIC 2
 ioapic0: Assigning PCI IRQ 20 to local APIC 2
 INTR: Assigning IRQ 21 to local APIC 4
 ioapic0: Assigning PCI IRQ 21 to local APIC 4
 INTR: Assigning IRQ 64 to local APIC 6
 ioapic1: Assigning PCI IRQ 64 to local APIC 6
 da0 at mpt0 bus 0 target 0 lun 0
 da0: <ATA WDC WD1600JS-75N 2E04> Fixed Direct Access SCSI-5 device
 da0: Serial Number      WD-WCANM5937301
 da0: 300.000MB/s transfers, Tagged Queueing Enabled
 da0: 152587MB (312500000 512 byte sectors: 255H 63S/T 19452C)
 Trying to mount root from ufs:/dev/da0s1a
 start_init: trying /sbin/init

From: Tom Judge <tom@tomjudge.com>
To: bug-followup@FreeBSD.org, roart@nvg.ntnu.no
Cc:  
Subject: Re: kern/100858: [bce] Broadcom bce driver and SMP hangup
Date: Tue, 27 Oct 2009 21:52:08 +0000

 This bug should not be present in modern releases (7.0+).
 
 Tom
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Sun Feb 24 22:49:53 UTC 2013 
State-Changed-Why:  
Is this still a problem on modern releases? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100858 
State-Changed-From-To: feedback->closed 
State-Changed-By: linimon 
State-Changed-When: Wed Jul 3 00:50:32 UTC 2013 
State-Changed-Why:  
Feedback timeout. 


Responsible-Changed-From-To: davidch->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Jul 3 00:50:32 UTC 2013 
Responsible-Changed-Why:  
commit bit has been taken in for safekeeping. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=100858 
>Unformatted:
