From nobody@FreeBSD.org  Thu Jan 27 09:27:12 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B7492106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 27 Jan 2011 09:27:12 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id A6EA18FC20
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 27 Jan 2011 09:27:12 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p0R9RB6D096981
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 27 Jan 2011 09:27:11 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p0R9RB1H096980;
	Thu, 27 Jan 2011 09:27:11 GMT
	(envelope-from nobody)
Message-Id: <201101270927.p0R9RB1H096980@red.freebsd.org>
Date: Thu, 27 Jan 2011 09:27:11 GMT
From: Dmitry Dolzenko <dol@ngcom.ru>
To: freebsd-gnats-submit@FreeBSD.org
Subject: HP DL360 G6 server hang when ssh transmit large  amounts of data
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         154331
>Category:       amd64
>Synopsis:       [hang] HP DL360 G6 server hang when ssh transmit large  amounts of data
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    yongari
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jan 27 09:30:09 UTC 2011
>Closed-Date:    Tue Sep 06 00:53:45 UTC 2011
>Last-Modified:  Tue Sep 06 00:53:45 UTC 2011
>Originator:     Dmitry Dolzenko
>Release:        8.1-RELEASE-p2
>Organization:
NG
>Environment:
FreeBSD srfilm.sernxx.ru 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #3: Wed Jan 26 14:25:18 MSK 2011     root@srfilm.sernxx.ru:/usr/obj/usr/src/sys/KERN  amd64
>Description:
Hewlett Packard DL360 G6 server hand without panic or exeption errors on the screen.
I connect to problematic server via ssh from freebsd or windows computer. 
When I run  "find /usr/ports" DL360 server hang after 3-4 seconds completely without errors and exeption messages. I can't switch consoles via alt+f1 alt+f2. Only caps lock LED work when I press caps lock key.

When I start "find /usr/ports" from local console all work fine. Problem appear after upgrade 8.0 to 8.1p2
Previously 8.0-RELEASE work fine without problems about 9 month on this server.

DL360 G6 have 2 bce LAN cards and ciss RAID.

Server configuration:
DL360G6 1x QC E5504 (2.0 GHz, 4 MB L3 Cache, 80Watt DDR3-800) 4 (2x2)GB Memory, P410i Z , 2 x 500GB 

>How-To-Repeat:
1. connect remotely via ssh
2. run "find /usr/ports"
server hang after 3-4 seconds 
>Fix:


>Release-Note:
>Audit-Trail:

From: Dmitry Dolzenko <dol@ngcom.ru>
To: bug-followup@FreeBSD.org, dol@ngcom.ru
Cc:  
Subject: Re: amd64/154331: [hang] HP DL360 G6 server hang when ssh transmit large  amounts of data
Date: Sat, 29 Jan 2011 23:20:48 +0300

 Hello,
 
 Problem exists also in version 8.2-RC2
 I am updating system to 8.2-RC2. Connect via ssh. Run "find /usr/ports"
 After  5-10 seconds connection terminated and I see page fault error on
 the server console.
 -------
 Fatal  trap 12: page fault while in kernel mode
 cpuid = 1; apic id =02
 fault virtual adress = 0x18
 fault code = supervisor read data, page not present
 instruction pointer = 0x20:0xffffffff8030bc51
 stack pointer= 0x28:0xffffff81b1e7d150
 frame pointer = 0x28:0xffffff81b1e7d190
 code segment = base 0x0, limit 0xfffff, type 0x1b
                 = DPL 0, pres1, long1, def32 0, gran 1
 processor eflags = intrerrupt enabled, resume, IOPL = 0 
 current process = 12 (irq257: bce0)
 trap number = 12
 panic: page fault
 cpuid = 1
 uptime 25m5s
 -------
 
 Probably problem related to bce driver.
 
 With best regards,
  Dmitry Dolzenko.
 
State-Changed-From-To: open->feedback 
State-Changed-By: yongari 
State-Changed-When: Wed Feb 2 02:04:43 UTC 2011 
State-Changed-Why:  
Would you show me both the dmesg and "ifconfig bce0" output? 
It would be even better to get backtraces to know where the panic 
occurred. Without that it's really hard to guess what caused the 
panic. Please see the following URL to get kernel backtrace. 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html 


Responsible-Changed-From-To: freebsd-amd64->yongari 
Responsible-Changed-By: yongari 
Responsible-Changed-When: Wed Feb 2 02:04:43 UTC 2011 
Responsible-Changed-Why:  
Grab. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154331 

From: Dmitry Dolzenko <dol@mig.phys.msu.ru>
To: bug-followup@FreeBSD.org, dol@ngcom.ru
Cc:  
Subject: Re: amd64/154331: [hang] HP DL360 G6 server hang when ssh transmit large  amounts of data
Date: Fri, 11 Feb 2011 16:28:31 +0300

 Thes server is downgraded to 8_0_RELEASE now.
 Server work ok.
 After downgrade I see this messages on the console.
 
 Feb 11 16:17:34 srfilm kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(6523): TSO enabled for non-TCP frame!.
 Feb 11 16:18:07 srfilm last message repeated 12 times
 Feb 11 16:18:09 srfilm kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(6523): TSO enabled for non-TCP frame!.
 
 
 This is dmesg output (after downgrading to 8_0_RELEASE)
 
 bce0: <HP NC382i DP Multifunction Gigabit Server Adapter (C0)> mem 0xf8000000-0xf9ffffff irq 31 at device 0.0 on pci2
 miibus0: <MII bus> on bce0
 brgphy0: <BCM5709C 10/100/1000baseTX PHY> PHY 1 on miibus0
 brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
 bce0: Ethernet address: 00:26:55:82:78:cc
 bce0: [ITHREAD]
 
 
 ifconfig
 
 bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
         ether 00:26:55:82:78:cc
         inet 81.xx.xx.186 netmask 0xffffff00 broadcast 81.xx.xx.255
         media: Ethernet autoselect (100baseTX <full-duplex>)
         status: active
 
 Before downgrading I try to save crash dump 2 times.
 First time dumping process aborted with "error 5"
 Second time server hang without dump.
 

From: YongHyeon PYUN <pyunyh@gmail.com>
To: Dmitry Dolzenko <dol@mig.phys.msu.ru>
Cc: yongari@freebsd.org, bug-followup@FreeBSD.org
Subject: Re: amd64/154331: [hang] HP DL360 G6 server hang when ssh transmit large  amounts of data
Date: Thu, 19 May 2011 10:46:34 -0700

 On Fri, Feb 11, 2011 at 01:50:09PM +0000, Dmitry Dolzenko wrote:
 
 [...]
 
 >  Thes server is downgraded to 8_0_RELEASE now.
 >  Server work ok.
 >  After downgrade I see this messages on the console.
 >  
 >  Feb 11 16:17:34 srfilm kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(6523): TSO enabled for non-TCP frame!.
 >  Feb 11 16:18:07 srfilm last message repeated 12 times
 >  Feb 11 16:18:09 srfilm kernel: bce0: /usr/src/sys/dev/bce/if_bce.c(6523): TSO enabled for non-TCP frame!.
 >  
 
 I guess this came from a bug in 8.0-RELEASE bce(4). I believe this
 was fixed long time ago though.
 >  
 >  This is dmesg output (after downgrading to 8_0_RELEASE)
 >  
 >  bce0: <HP NC382i DP Multifunction Gigabit Server Adapter (C0)> mem 0xf8000000-0xf9ffffff irq 31 at device 0.0 on pci2
 >  miibus0: <MII bus> on bce0
 >  brgphy0: <BCM5709C 10/100/1000baseTX PHY> PHY 1 on miibus0
 >  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
 >  bce0: Ethernet address: 00:26:55:82:78:cc
 >  bce0: [ITHREAD]
 >  
 >  
 >  ifconfig
 >  
 >  bce0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >          options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
 >          ether 00:26:55:82:78:cc
 >          inet 81.xx.xx.186 netmask 0xffffff00 broadcast 81.xx.xx.255
 >          media: Ethernet autoselect (100baseTX <full-duplex>)
 >          status: active
 >  
 >  Before downgrading I try to save crash dump 2 times.
 >  First time dumping process aborted with "error 5"
 >  Second time server hang without dump.
 >  
 
 Without getting back trace it's nearly impossible to narrow down
 what was the problem. If you can't dump core, use serial console to
 capture back trace information from ddb. Taking a photo after
 getting back trace in ddb also would be good if you're tired to
 write down the back trace message.
State-Changed-From-To: feedback->closed 
State-Changed-By: yongari 
State-Changed-When: Tue Sep 6 00:53:13 UTC 2011 
State-Changed-Why:  
Feedback timeout(> 3 months). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154331 
>Unformatted:
