From gnn@waits.engr.nominum.com  Wed Sep 18 14:42:08 2002
Return-Path: <gnn@waits.engr.nominum.com>
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 019FF37B401
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Sep 2002 14:42:08 -0700 (PDT)
Received: from waits.engr.nominum.com (waits.engr.nominum.com [128.177.194.35])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B03C943E3B
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Sep 2002 14:42:07 -0700 (PDT)
	(envelope-from gnn@waits.engr.nominum.com)
Received: (from gnn@localhost)
	by waits.engr.nominum.com (8.11.6/8.11.6) id g8ILg7130781;
	Wed, 18 Sep 2002 14:42:07 -0700 (PDT)
	(envelope-from gnn)
Message-Id: <200209182142.g8ILg7130781@waits.engr.nominum.com>
Date: Wed, 18 Sep 2002 14:42:07 -0700 (PDT)
From: "George V. Neville-Neil" <gnn@neville-neil.com>
Reply-To: "George V. Neville-Neil" <gnn@neville-neil.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: panic when ARP cache uses up all mbufs
X-Send-Pr-Version: 3.113
X-GNATS-Notify: ru

>Number:         42937
>Category:       kern
>Synopsis:       panic when ARP cache uses up all mbufs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bms
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 18 14:50:01 PDT 2002
>Closed-Date:    Sat Sep 23 20:16:07 GMT 2006
>Last-Modified:  Sat Sep 23 20:20:18 GMT 2006
>Originator:     George V. Neville-Neil
>Release:        FreeBSD 4.5-RELEASE i386
>Organization:
Nominum Inc. 
>Environment:
System: FreeBSD waits.engr.nominum.com 4.5-RELEASE FreeBSD 4.5-RELEASE #0: Mon Jun 24 14:33:42 PDT 2002 gnn@waits.engr.nominum.com:/usr/src/sys/compile/WAITS i386

>Description:
	We have an application that can send a ping to many different clients
very quickly (it is a DHCP server).  On FreeBSD each ping causes an ARP entry
to be created and the associated mbuf (for the outgoing ICMP packet) to be
stored with it.  This packet is trapped until the ARP reply comes back or it
is timed out.  The arp cache is timed out every 5 minutes.  It is quite easy
to use up all the mbufs in this time, and then any mbuf allocation that 
panics on failure causes the kernel to die.

>How-To-Repeat:

	Write a program that pings many addresses.
	Watch netstat -m.
	Watch kernel die.

>Fix:
	None known.
>Release-Note:
>Audit-Trail:

From: Gleb Smirnoff <glebius@cell.sick.ru>
To: "George V. Neville-Neil" <gnn@waits.engr.nominum.com>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/42937: panic when ARP cache uses up all mbufs
Date: Thu, 19 Sep 2002 18:28:26 +0400

  Hi!
 
  I have faced the same situation. A box that worked for 10
 month without any unwanted reboot began to reboot quite often
 (3 times a week or even more) after I have installed script
 that pings two /24 directly connected subnets.
 
  If someone takes this PR I can provide him with needed debug
 info.
 
 On Wed, Sep 18, 2002 at 02:42:07PM -0700, George V. Neville-Neil wrote:
 G> >Number:         42937
 G> >Category:       kern
 G> >Synopsis:       panic when ARP cache uses up all mbufs
 G> >Confidential:   no
 G> >Severity:       critical
 G> >Priority:       high
 G> >Responsible:    freebsd-bugs
 G> >State:          open
 G> >Quarter:        
 G> >Keywords:       
 G> >Date-Required:
 G> >Class:          sw-bug
 G> >Submitter-Id:   current-users
 G> >Arrival-Date:   Wed Sep 18 14:50:01 PDT 2002
 G> >Closed-Date:
 G> >Last-Modified:
 G> >Originator:     George V. Neville-Neil
 G> >Release:        FreeBSD 4.5-RELEASE i386
 G> >Organization:
 G> Nominum Inc. 
 G> >Environment:
 G> System: FreeBSD waits.engr.nominum.com 4.5-RELEASE FreeBSD 4.5-RELEASE #0: Mon Jun 24 14:33:42 PDT 2002 gnn@waits.engr.nominum.com:/usr/src/sys/compile/WAITS i386
 G> 
 G> >Description:
 G> 	We have an application that can send a ping to many different clients
 G> very quickly (it is a DHCP server).  On FreeBSD each ping causes an ARP entry
 G> to be created and the associated mbuf (for the outgoing ICMP packet) to be
 G> stored with it.  This packet is trapped until the ARP reply comes back or it
 G> is timed out.  The arp cache is timed out every 5 minutes.  It is quite easy
 G> to use up all the mbufs in this time, and then any mbuf allocation that 
 G> panics on failure causes the kernel to die.
 G> 
 G> >How-To-Repeat:
 G> 
 G> 	Write a program that pings many addresses.
 G> 	Watch netstat -m.
 G> 	Watch kernel die.
 G> 
 G> >Fix:
 G> 	None known.
 G> >Release-Note:
 G> >Audit-Trail:
 G> >Unformatted:
 G> 
 G> To Unsubscribe: send mail to majordomo@FreeBSD.org
 G> with "unsubscribe freebsd-bugs" in the body of the message
 
 -- 
 Totus tuus, Glebius.
 GLEBIUS-RIPN GLEB-RIPE

From: Robert Watson <rwatson@FreeBSD.ORG>
To: "George V. Neville-Neil" <gnn@waits.engr.nominum.com>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/42937: panic when ARP cache uses up all mbufs
Date: Fri, 20 Sep 2002 14:18:09 -0400 (EDT)

 A potential fix for this problem would be to place a bound on the number
 (total size?) of mbufs permitted to be allocated for holding onto packets
 pending an arp resolution.  Since interfaces are potentially lossy anyway
 due to queuing issues, protocols should recover.  Such a "solution" would
 be easy to implement in that it would be a constant (possibly scaled based
 on some or another configuration parameter) in addition to an outstanding
 count for the number of saved packets.  If adding a packet to the saved
 packet set would exceed the limit, then don't.
 
 Another possibly strategy is to have two seperate timeouts: the current
 long timeout for an arp query to expire, but a second shorter timeout for
 dropping data that is waiting on the arp.  I.e., if arp doesn't complete
 in thirty seconds, drop the packet that caused the arp, but allow the arp
 process to continue the full give minutes. 
 
 Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
 robert@fledge.watson.org      Network Associates Laboratories
 
Responsible-Changed-From-To: freebsd-bugs->bms 
Responsible-Changed-By: maxim 
Responsible-Changed-When: Thu Sep 25 03:12:21 PDT 2003 
Responsible-Changed-Why:  
It looks like bms just fixed this bug. 

Bruce, could you confirm that? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42937 

From: Bruce M Simpson <bms@spc.org>
To: freebsd-gnats-submit@FreeBSD.org, gnn@waits.engr.nominum.com
Cc:  
Subject: Re: kern/42937: panic when ARP cache uses up all mbufs
Date: Fri, 26 Sep 2003 14:32:17 +0100

 No, the ARP starvation bug identified in FreeBSD-SA-03:14.arp is not
 the same bug. ru@ and I are investigating this one, though (they are
 separate, and only related insofaras they are present in the same
 module).
 
 BMS

From: Bruce M Simpson <bms@spc.org>
To: freebsd-gnats-submit@FreeBSD.org, gnn@waits.engr.nominum.com
Cc:  
Subject: Re: kern/42937: panic when ARP cache uses up all mbufs
Date: Tue, 30 Sep 2003 03:41:17 +0100

 I've managed to replicate this bug using Ruslan's script and will
 continue investigations.
 
 BMS
State-Changed-From-To: open->analyzed 
State-Changed-By: bms 
State-Changed-When: Thu 27 Nov 2003 02:37:10 PST 
State-Changed-Why:  
I concur with rwatson's analysis. However when trying to replicate a panic 
with Ruslan's script today, I hit the mbuf and file resource limits before 
any harm could be done, on both 4.9-RELEASE and 5.2-BETA. 

We should probably implement one of the two strategies outlined as it is 
still possible for a hostile local user to cause resource starvation but 
not an outright panic. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42937 
State-Changed-From-To: analyzed->suspended 
State-Changed-By: bms 
State-Changed-When: Tue 2 Dec 2003 03:07:33 PST 
State-Changed-Why:  
Submitter's email bounces: 

Diagnostic-Code: X-Postfix; connect to waits.engr.nominum.com[128.177.194.35]: 
No route to host                                                           


http://www.freebsd.org/cgi/query-pr.cgi?pr=42937 
State-Changed-From-To: suspended->analyzed 
State-Changed-By: bms 
State-Changed-When: Wed Jun 16 02:20:26 GMT 2004 
State-Changed-Why:  
We haven't forgotten about this. Waiting to see what andre@ 
comes up with as per arp rewrite. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42937 
State-Changed-From-To: analyzed->closed 
State-Changed-By: bms 
State-Changed-When: Sat Sep 23 20:15:38 UTC 2006 
State-Changed-Why:  
Panic went away due to Bosko-mbuf-allocator-rewrite (yay) but 
kernel resource starvation is still possible. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42937 

From: Bruce M Simpson <bms@incunabulum.net>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/42937: panic when ARP cache uses up all mbufs
Date: Sat, 23 Sep 2006 21:14:59 +0100

 Hi,
 
 I attempted to reproduce this panic today in a 512MB QEMU simulation 
 with fping across 10.10.0.0/16.
 
 I could not panic FreeBSD 7.0-CURRENT. I anticipate that the fatality of 
 the PR went away during the 5.x lifetime due to Bosko's mbuf allocator 
 rewrite.
 
 To try it do this:
 /usr/local/sbin/fping -r 1000 -i 2 -t 500000 -g 10.10.0.0/16
 
 Note that netstat -m utilization peaked around 24000 mbufs with 6MB of 
 wired kernel memory allocated to mbufs, but no crash -- just resource 
 starvation.
 
 Regards,
 BMS
>Unformatted:
