From admin@nanuoya.pdn.ac.lk  Fri May 25 20:34:23 2012
Return-Path: <admin@nanuoya.pdn.ac.lk>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ED18B1065672
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 25 May 2012 20:34:23 +0000 (UTC)
	(envelope-from admin@nanuoya.pdn.ac.lk)
Received: from nanuoya.pdn.ac.lk (unknown [IPv6:2401:dd00:30::1f00])
	by mx1.freebsd.org (Postfix) with ESMTP id 4DD2C8FC17
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 25 May 2012 20:34:22 +0000 (UTC)
Received: from nanuoya.pdn.ac.lk (localhost [127.0.0.1])
	by nanuoya.pdn.ac.lk (8.14.5/8.14.5) with ESMTP id q4PKYKAj038871;
	Sat, 26 May 2012 02:04:20 +0530 (IST)
	(envelope-from admin@nanuoya.pdn.ac.lk)
Received: (from admin@localhost)
	by nanuoya.pdn.ac.lk (8.14.5/8.14.5/Submit) id q4PKYKcB038870;
	Sat, 26 May 2012 02:04:20 +0530 (IST)
	(envelope-from admin)
Message-Id: <201205252034.q4PKYKcB038870@nanuoya.pdn.ac.lk>
Date: Sat, 26 May 2012 02:04:20 +0530 (IST)
From: Ziyan Maraikar <ziyanm@gmail.com>
Reply-To: Ziyan Maraikar <ziyanm@gmail.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc: Darshana Jayasinghe <darshana.jayasinghe@gmail.com>
Subject: mbuf exhaustion hangs all daemons in keglimit state
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         168342
>Category:       kern
>Synopsis:       [mbuf] mbuf exhaustion hangs all daemons in keglimit state
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 25 20:40:01 UTC 2012
>Closed-Date:    
>Last-Modified:  Tue May 29 16:50:05 UTC 2012
>Originator:     Ziyan Maraikar
>Release:        FreeBSD 9.0-RELEASE amd64
>Organization:
Department of computer engineering, University of Peradeniya
>Environment:
System: FreeBSD nanuoya.pdn.ac.lk 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
HP Proliant DL165 4-core, 8G RAM
4x igb NICs -- 1 interface assigned 6 IPv4 aliases.
3x 1TB SATA zfs RAID-Z pool (zfs boot)

>Description:
This machine has been running DHCP, BIND, NFS and, openldap serving a lab of about 40 machines. The machine recently began to experience very frequentlockups in all network services including, ssh. The services all hang in state keglimit, even under very light load. I have tried disbling TSO and hardware checksum on igb as suggested in related mailing list posts, but it has no effect.

>How-To-Repeat:
Several ssh attempts after boot is enough to make all daemons hang in keglimit.
# netstat -m
25034/1602/26636 mbufs in use (current/cache/total)
24892/708/25600/25600 mbuf clusters in use (current/cache/total/max)
24642/708 mbuf+clusters out of packet secondary zone in use (current/cache)
0/9/9/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
56053K/1852K/57905K bytes allocated to network (current/cache/total)
0/1697/1209 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

>Fix:

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-amd64->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon May 28 02:27:38 UTC 2012 
Responsible-Changed-Why:  
A customer of mine is also seeing this.  However, I do not believe it is 
amd64-specific. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=168342 

From: John Baldwin <jhb@freebsd.org>
To: freebsd-amd64@freebsd.org,
 Ziyan Maraikar <ziyanm@gmail.com>
Cc: FreeBSD-gnats-submit@freebsd.org,
 Darshana Jayasinghe <darshana.jayasinghe@gmail.com>
Subject: Re: amd64/168342: mbuf exhaustion hangs all daemons in keglimit state
Date: Tue, 29 May 2012 08:12:40 -0400

 On Friday, May 25, 2012 4:34:20 pm Ziyan Maraikar wrote:
 > 
 > >Number:         168342
 > >Category:       amd64
 > >Synopsis:       mbuf exhaustion hangs all daemons in keglimit state
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    freebsd-amd64
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Fri May 25 20:40:01 UTC 2012
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Ziyan Maraikar
 > >Release:        FreeBSD 9.0-RELEASE amd64
 > >Organization:
 > Department of computer engineering, University of Peradeniya
 > >Environment:
 > System: FreeBSD nanuoya.pdn.ac.lk 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue 
 Jan 3 07:46:30 UTC 2012 
 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
 > HP Proliant DL165 4-core, 8G RAM
 > 4x igb NICs -- 1 interface assigned 6 IPv4 aliases.
 > 3x 1TB SATA zfs RAID-Z pool (zfs boot)
 > 
 > >Description:
 > This machine has been running DHCP, BIND, NFS and, openldap serving a lab of 
 about 40 machines. The machine recently began to experience very 
 frequentlockups in all network services including, ssh. The services all hang 
 in state keglimit, even under very light load. I have tried disbling TSO and 
 hardware checksum on igb as suggested in related mailing list posts, but it 
 has no effect.
 > 
 > >How-To-Repeat:
 > Several ssh attempts after boot is enough to make all daemons hang in 
 keglimit.
 > # netstat -m
 > 25034/1602/26636 mbufs in use (current/cache/total)
 > 24892/708/25600/25600 mbuf clusters in use (current/cache/total/max)
 > 24642/708 mbuf+clusters out of packet secondary zone in use (current/cache)
 > 0/9/9/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
 > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
 > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
 > 56053K/1852K/57905K bytes allocated to network (current/cache/total)
 > 0/1697/1209 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
 > 0/0/0 sfbufs in use (current/peak/max)
 > 0 requests for sfbufs denied
 > 0 requests for sfbufs delayed
 > 0 requests for I/O initiated by sendfile
 > 0 calls to protocol drain routines
 
 Have you tried increasing kern.ipc.nmbclusters?  Alternatively, have you tried 
 restricting igb to only using 1 queue?  It sounds like all your igb interfaces 
 are allocating all of your mbuf clusters for their receive rings.
 
 -- 
 John Baldwin

From: Ziyan Maraikar <ziyanm@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: freebsd-amd64@freebsd.org,
 FreeBSD-gnats-submit@freebsd.org,
 Darshana Jayasinghe <darshana.jayasinghe@gmail.com>
Subject: Re: amd64/168342: mbuf exhaustion hangs all daemons in keglimit state
Date: Tue, 29 May 2012 22:14:02 +0530

 Hello John,
 
 Thanks for the response.
 >=20
 > Have you tried increasing kern.ipc.nmbclusters?  Alternatively, have =
 you tried=20
 > restricting igb to only using 1 queue?  It sounds like all your igb =
 interfaces=20
 > are allocating all of your mbuf clusters for their receive rings.
 >=20
 I found this very suggestion on several mailing list discussions [1] and =
 set these values on Saturday.
 kern.ipc.nmbclusters=3D"131072"
 hw.igb.num_queues=3D"2"
 So far everything seems to back to normal, and netstat -m shows plenty =
 of headroom now.=20
 
 The problem cropped up after running several months on 9.0-RELEASE when =
 I  brought up another interface. Disabling the new interface didn't =
 restore normal operation, however. I also tried 8.3-RELEASE but the =
 problem was worse on it.
 
 [1] http://osdir.com/ml/freebsd-stable/2012-02/msg00563.html
 __
 Regards
 Ziyan.=
>Unformatted:
