From nobody@FreeBSD.org  Sat May 26 07:31:01 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 2A94537B422
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 26 May 2001 07:31:01 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.1/8.11.1) id f4QEV0b32664;
	Sat, 26 May 2001 07:31:01 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200105261431.f4QEV0b32664@freefall.freebsd.org>
Date: Sat, 26 May 2001 07:31:01 -0700 (PDT)
From: pekkas@netcore.fi
To: freebsd-gnats-submit@FreeBSD.org
Subject: >1000 ipfw rules and heavy traffic crash the system
X-Send-Pr-Version: www-1.0

>Number:         27661
>Category:       kern
>Synopsis:       >1000 ipfw rules and heavy traffic crash the system
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    luigi-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat May 26 07:40:00 PDT 2001
>Closed-Date:    Mon Sep 3 13:25:17 PDT 2001
>Last-Modified:  Mon Sep 03 13:31:24 PDT 2001
>Originator:     Pekka Savola
>Release:        4.3-STABLE
>Organization:
Netcore
>Environment:
FreeBSD xxx.org 4.3-STABLE FreeBSD 4.3-STABLE #4: Thu May 10 14:00:09 EDT 2001     root@xxx.org:/usr/obj/usr/src/sys/DEN  i386

>Description:
See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.cgifetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable

I noticed that if you create too many ipfw rules, through which extra
traffic must pass, rather soon you will crash the system.

In this scenario, adding >1000 non-matching rules before the
standard tcp established rule, and doing 20Mbit/s steady through the
rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
an hour, crash the system.

==> Of course, adding >1000 non-matching rules is stupid, but that is not
==> the point.  The system should not crash this way, without any error
==> messages.

The crash causes all userspace to become totally non-responsive: ping and
traceroute from the outside work ok, but all existing connections become
non-responsive.  New TCP establishment work until when you'd have
to communicate with the daemon.  Console keyboard does not react to
CTRL-ALT-DEL.

This is _not_ caused by mbuf/mbuf cluster usage; I have a cronjob saving
these as a debugging information every two minutes, and there was no
significant increase there; peak had never gone more than the half of the
maximum.

The same crash has happened with smaller number of non-matching rules too;
e.g. 600.  Usually took longer this way.

This had happened like 3-4 before I realized what was going wrong.

Probably not relevant, but after every crash, there were usually a _lot_
of FS inconsistancies.


>How-To-Repeat:
Add a lot of ipfw rules traffic must pass through.
Generate _loads_ of traffic (20+ Mbit/s).
Wait for a few hours.
>Fix:
Rearrange the ipfw rules (does not fix the _real_ problem, ie.
the system should not crash like this, without any errors, though).

>Release-Note:
>Audit-Trail:

From: Kris Kennaway <kris@obsecurity.org>
To: pekkas@netcore.fi
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/27661: >1000 ipfw rules and heavy traffic crash the system
Date: Sat, 26 May 2001 16:32:17 -0700

 --cvVnyQ+4j833TQvp
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Sat, May 26, 2001 at 07:31:01AM -0700, pekkas@netcore.fi wrote:
 
 > >Description:
 > See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.c=
 gifetch=3D856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable
 
 This URL does not seem to be valid.
 
 > I noticed that if you create too many ipfw rules, through which extra
 > traffic must pass, rather soon you will crash the system.
 >=20
 > In this scenario, adding >1000 non-matching rules before the
 > standard tcp established rule, and doing 20Mbit/s steady through the
 > rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
 > an hour, crash the system.
 
 When you say "crash" do you mean "panic" (the usual meaning), or "lock
 up"?  If the former, please obtain a panic traceback to aid in debugging.
 
 It sounds to me as if this is just a case of giving the system too
 much work to do.  If it has to spend more time processing a packet
 than the time between packet arrival, things are going to go badly.
 
 As far as I know ipfw doesn't have an 'exit clause' which drops
 packets if they are taking too long to process.  I don't know if it
 would be easy to add one; the best solution, as you noted, is to not
 write inefficient rulesets.
 
 Kris
 
 
 --cvVnyQ+4j833TQvp
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.0.5 (FreeBSD)
 Comment: For info see http://www.gnupg.org
 
 iD8DBQE7ED0AWry0BWjoQKURAqnvAJ99gyJb+UlwYGgS5B8+oCoUCUnQ+gCgizv0
 iRgGCS7TGwdQzR2KP9WVIlA=
 =6kia
 -----END PGP SIGNATURE-----
 
 --cvVnyQ+4j833TQvp--

From: Pekka Savola <pekkas@netcore.fi>
To: Kris Kennaway <kris@obsecurity.org>
Cc: <freebsd-gnats-submit@FreeBSD.org>
Subject: Re: kern/27661: >1000 ipfw rules and heavy traffic crash the system
Date: Sun, 27 May 2001 09:13:53 +0300 (EEST)

 On Sat, 26 May 2001, Kris Kennaway wrote:
 
 > On Sat, May 26, 2001 at 07:31:01AM -0700, pekkas@netcore.fi wrote:
 >
 > > >Description:
 > > See and the threads mentioned there: http://docs.freebsd.org/cgi/getmsg.cgifetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable
 >
 > This URL does not seem to be valid.
 
 Hmm, cut'n'paste error perhaps.  Again:
 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=856687+0+archive/2001/freebsd-stable/20010520.freebsd-stable
 
 Anyway, these were threads on freebsd-stable:
 
 "4.3-S: >1000 ipfw rules and heavy traffic crash the system" (18 May)
 "4.3-S: No buffer space available" (5 May)
 
 > > I noticed that if you create too many ipfw rules, through which extra
 > > traffic must pass, rather soon you will crash the system.
 > >
 > > In this scenario, adding >1000 non-matching rules before the
 > > standard tcp established rule, and doing 20Mbit/s steady through the
 > > rules, caused kernel load to go to ~8.0 (Dual P3/866) and after less than
 > > an hour, crash the system.
 >
 > When you say "crash" do you mean "panic" (the usual meaning), or "lock
 > up"?  If the former, please obtain a panic traceback to aid in debugging.
 
 lock up.  I hope it had paniced, so it could be traceable :-/
 
 > It sounds to me as if this is just a case of giving the system too
 > much work to do.  If it has to spend more time processing a packet
 > than the time between packet arrival, things are going to go badly.
 
 There seems a to be a point of no return there: if some amount of
 processing is done, the TCP connections do not send new data anymore etc.
 I haven't monitored what the bandwidth usage is like then, but I suspect
 it is very little; I have doubts that there is the equally high number of
 incoming connections then.
 
 So, I don't think this is just too slow processing.  It looks like too
 heavy processing triggers some big problem causing the lock-up.
 
 > As far as I know ipfw doesn't have an 'exit clause' which drops
 > packets if they are taking too long to process.  I don't know if it
 > would be easy to add one; the best solution, as you noted, is to not
 > write inefficient rulesets.
 
 I'm not used to (partial) kernel lockup's without any messages printed on
 console or syslog; these are very difficult to figure out what is causing
 them.  That is why I'd like a "right" solution for this, not just "Don't
 Do It". Someone is bound to do the same thing sooner or later and wonder
 about fscking FreeBSD locking up all the time without explanation.
 
 -- 
 Pekka Savola                 "Tell me of difficulties surmounted,
 Netcore Oy                   not those you stumble over and fall"
 Systems. Networks. Security.  -- Robert Jordan: A Crown of Swords
 
State-Changed-From-To: open->closed 
State-Changed-By: luigi 
State-Changed-When: Mon Sep 3 13:25:17 PDT 2001 
State-Changed-Why:  
This report basically says that when the system is in 
livelock conditions it might crash. 

This does not seem specific to the ipfw code -- the kernel is 
full of places where you can have potentially very time consuming 
processing procedures at various priorities (or, while holding locks) 
and cause havoc to the system. 
If this report identified a specific problem, i'd have no problem 
in fixing it, but there is just nothing evident here. This is why 
I am closing this PR. 



Responsible-Changed-From-To: freebsd-bugs->luigi-bugs 
Responsible-Changed-By: luigi 
Responsible-Changed-When: Mon Sep 3 13:25:17 PDT 2001 
Responsible-Changed-Why:  
i have been involved in ipfw mainteinance lately 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=27661 
>Unformatted:
