From pawmal@unia.3lo.lublin.pl  Thu Feb  6 07:21:21 2003
Return-Path: <pawmal@unia.3lo.lublin.pl>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2817537B401
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  6 Feb 2003 07:21:21 -0800 (PST)
Received: from unia.3lo.lublin.pl (unia.3lo.lublin.pl [212.182.70.2])
	by mx1.FreeBSD.org (Postfix) with SMTP id EE93E43F3F
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  6 Feb 2003 07:21:19 -0800 (PST)
	(envelope-from pawmal@unia.3lo.lublin.pl)
Received: (qmail 52737 invoked by uid 1007); 6 Feb 2003 15:21:57 -0000
Message-Id: <20030206152157.52736.qmail@unia.3lo.lublin.pl>
Date: 6 Feb 2003 15:21:57 -0000
From: "Pawe" "Maachowski" <pawmal@unia.3lo.lublin.pl>
Reply-To: "Pawe" "Maachowski" <pawmal@unia.3lo.lublin.pl>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: dummynet(4) related machine hangs
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         48009
>Category:       kern
>Synopsis:       dummynet(4) related machine hangs
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    maxim
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 06 07:30:05 PST 2003
>Closed-Date:    Fri Mar 28 03:53:24 PST 2003
>Last-Modified:  Fri Mar 28 03:53:24 PST 2003
>Originator:     Pawe Maachowski
>Release:        FreeBSD 4.7-STABLE i386
>Organization:
ASK ZiN
>Environment:
System: FreeBSD gargantua.zin.ask 4.7-STABLE FreeBSD 4.7-STABLE #0: Mon Feb  3 23:33:35 CET 2003 root@gargantua.zin.ask:/mnt/j1/obj/usr/src/sys/PM-UX-AUTO-47S  i386

	
>Description:
	

Machine hangs within few hours after playing with dummynet(4) pipes configuration.
System looks completly `frozen', only power-off/power-on cycle helps.
It's easy to reproduce, see the How-To-Repeat section.

This is the 4.7-STABLE, compiled from cvsupped sources dated 3rd February 2003.
The IPFW2 IP Firewall code is used.
However, similar problems were noticed in the past with the IPFW1,
please compare with: kern/37573, kern/43133.
It looks it was reported 9 months ago but was never fixed.


>How-To-Repeat:
	
Configure dummynet as shown here and run the test-ipfw.sh script.
rl0 is my external interface.

# ipfw show
65535   14249742 10902331939 allow ip from any to any

# ipfw add 10 pipe 10 ip from any to any out xmit rl0
# ipfw pipe 10 config bw 0
# ipfw add 20 pipe 20 ip from any to any in recv rl0
# ipfw pipe 20 config bw 0
# sh ./test-ipfw.sh
[...]
Test number 375 (Czw 6 Lut 14:52:29 2003 CET).
Step 1.
Step 2.
Step 3.

And machine hangs.

Once again, with the same configuration:
Test number 22 (Czw 6 Lut 15:12:36 2003 CET).
Step 1.
Step 2.
Step 3.

And machine hangs again.

This script was also `sucessfully' tested by my friend on some
other 4.7-STABLE machine -- reported hang within 3 minutes...

Here is the script I used to provoke the problem:
# cat test-ipfw.sh
#!/bin/sh
i=1
while [ 1 ]
do
 echo Test number $i \(`date`\).
 echo Step 1.
 ipfw pipe 10 config bw 2Mbit/s queue 10
 sleep 1
 echo Step 2.
 ipfw pipe 20 config bw 2Mbit/s queue 10
 sleep 2
 echo Step 3.
 ipfw pipe 10 config bw 4096kbit/s queue 20
 sleep 1
 echo Step 4.
 ipfw pipe 20 config bw 4096kbit/s queue 20
 echo OK, waiting 3 seconds, and...
 sleep 3
 i=`expr $i + 1`
done


>Fix:
	
Unknown.

>Release-Note:
>Audit-Trail:

From: Maxim Konovalov <maxim@macomnet.ru>
To: =?KOI8-R?Q?Pawe=B3_Ma=B3achowski?= <pawmal@unia.3lo.lublin.pl>
Cc: bug-followup@freebsd.org
Subject: Re: kern/48009: dummynet(4) related machine hangs
Date: Thu, 6 Feb 2003 18:33:58 +0300 (MSK)

 Known problem. Could you please try a patch below?
 
 Index: ip_dummynet.c
 ===================================================================
 RCS file: /home/maxim/cvs/sys/netinet/ip_dummynet.c,v
 retrieving revision 1.6
 retrieving revision 1.7
 diff -u -r1.6 -r1.7
 --- ip_dummynet.c	9 Dec 2002 12:41:50 -0000	1.6
 +++ ip_dummynet.c	9 Dec 2002 13:45:30 -0000	1.7
 @@ -1547,6 +1547,7 @@
  	} else
  	    x = b;
 
 +	s = splimp();
  	    x->bandwidth = p->bandwidth ;
  	x->numbytes = 0; /* just in case... */
  	bcopy(p->if_name, x->if_name, sizeof(p->if_name) );
 @@ -1561,14 +1562,13 @@
  		free(x, M_DUMMYNET);
  		return s ;
  	    }
 -	    s = splimp() ;
  	    x->next = b ;
  	    if (a == NULL)
  		all_pipes = x ;
  	    else
  		a->next = x ;
 -	    splx(s);
  	}
 +	splx(s);
      } else { /* config queue */
  	struct dn_flow_set *x, *a, *b ;
 
 @@ -1597,6 +1597,7 @@
  		return EINVAL ;
  	    x = b;
  	}
 +	s = splimp();
  	set_fs_parms(x, pfs);
 
  	if ( x->rq == NULL ) { /* a new flow_set */
 @@ -1605,14 +1606,13 @@
  		free(x, M_DUMMYNET);
  		return s ;
  	    }
 -	    s = splimp() ;
  	    x->next = b;
  	    if (a == NULL)
  		all_flow_sets = x;
  	    else
  		a->next = x;
 -	    splx(s);
  	}
 +	splx(s);
      }
      return 0 ;
  }
 
 %%%
 
 -- 
 Maxim Konovalov, maxim@macomnet.ru, maxim@FreeBSD.org
 

From: "Pawel Malachowski" <pawmal@unia.3lo.lublin.pl>
To: Maxim Konovalov <maxim@macomnet.ru>
Cc: bug-followup@freebsd.org
Subject: Re: kern/48009: dummynet(4) related machine hangs
Date: Thu, 6 Feb 2003 18:39:11 +0100

 On 6 Feb 03, at 18:33, Maxim Konovalov wrote:
 
 > Known problem. Could you please try a patch below?
 
 I've successfully applied this patch, recompiled and reinstalled
 kernel as soon as possible.
 
 Well, machine was working without problems quite long this time,
 but finally it crashed as usual:
 
 # sh ./test-ipfw.sh
 [...]
 Test number 511 (Czw 6 Lut 18:18:54 2003 CET).
 Step 1.
 Step 2.
 Step 3.
 Step 4.
 OK, waiting 3 seconds, and...
 
 Frozen after 511 iterations of test-ipfw.sh script.
 
 While testing, I've added: ipfw add 5 skipto 4000 ip from any to any
 to omit pipes for a while without stopping the script. This rule was
 removed after few minutes, about half an hout before system crash.
 (I can't say if it was related with crash or not).
 
 The problem is still there. :/
 
 
 -- 
 Pawe Maachowski
 

From: "Pawel Malachowski" <pawmal@unia.3lo.lublin.pl>
To: Maxim Konovalov <maxim@macomnet.ru>
Cc: bug-followup@freebsd.org, freebsd-ipfw@freebsd.org
Subject: Re: kern/48009: dummynet(4) related machine hangs
Date: Mon, 10 Feb 2003 13:00:33 +0100

 Hello,
 
 	I've checked if really q->numbytes is getting negative
 as Mike Hibler described in kern/37573 and I realized this is
 true. My machine goes into infinite loop in splimp()/splx()
 sections and that's why it looks as if it was frozen.
 I've decided the check if the problem exists on other 2 machines
 (4.7-RELEASE and 4.7-STABLE) and I must confirm, all of them hang
 is the same way when I frequently modify bandwith parameter.
 
 Once again,
 * Install fresh FreeBSD 4.7-RELEASE
 * Recompile GENERIC kernel with IPFW2, rebuild ipfw and libalias
   as described in ipfw(8), reinstall and reboot
 * Do:
 	ipfw pipe 10 config bw 0
 	ipfw pipe 20 config bw 0
 	ipfw add 10 pipe 10 ip from any to any in recv NIC
 	ipfw add 20 pipe 20 ip from any to any out xmit NIC
 
   (my NIC was: rl0, fxp0 -- a NIC one is using to connect with LAN)
   Then connect for example to the fast FTP-server in LAN and
   GET some big file (hundreds of MB).
   While this file is downloading, run the following script:
 	#!/bin/sh
 	while [ 1 ]
 	i=1
 	do
 	 echo Test number $i \(`date`\).
 	 echo Step 1.
 	 ipfw pipe 10 config bw 512kbit/s
 	 echo Step 2.
 	 ipfw pipe 20 config bw 512kbit/s
 	 sleep 1
 	 echo Step 3.
 	 ipfw pipe 10 config bw 2Mbit/s
 	 echo Step 4.
 	 ipfw pipe 20 config bw 2Mbit/s
 	 sleep 1
 	 i=`expr $i + 1`
 	done
 * And look q->numbytes after a while will grow fast up to 2^31-1
   and revert into negative causing system hang.
 
 
 What is going on?
 In ip_dummynet.c, in ready_event() the following line (551)
 reverts q->numbytes into negative:
 
 	q->numbytes += ( curr_time - q->sched_time ) * p->bandwidth;
 
 causing dummynet() to go into infinite loop with ready_event().
 Note, in line 557 we are decreasing q->numbytes preventing it from
 growing to much:
 	q->numbytes -= len_scaled ;
 (where len_scaled = pkt->dn_m->m_pkthdr.len * 8 * hz)
 
 When we frequently modify bandwith using ipfw pipe config bw xxx,
 q->numbytes starts growing up to maximum signed integer size.
 This was easy to observe cause I've added diagnostic printf
 to config_pipe() showing current q->numbytes value every time
 config_pipe() was called (every time ipfw pipe config bw xxx
 was used).
 It looks, when we are downloading something big (means, we
 have high traffic on pipe 10 (recv), the q->numbytes associated
 with pipe 20 (xmit!) grows fast.
 
 
 This patch (work-around) comes from kern/37573 and was changed
 a bit to cleanly apply on RELENG_4_7 ip_dummynet.c. It works
 for me preventing machine from hanging.
 
 ====================
 *** ip_dummynet.c.origThu Jan 23 22:06:45 2003
 --- ip_dummynet.cSun Feb  9 23:51:49 2003
 ***************
 *** 549,554 ****
 --- 549,559 ----
        * setting len_scaled = 0 does the job.
        */
       q->numbytes += ( curr_time - q->sched_time ) * p->bandwidth;
 +     if (q->numbytes<0) {
 + /* This shouldn't happen, I clear q->numbytes in config_pipe() */
 + printf("Oops, ready_event has a problem with q->numbytes<0.\n");
 + q->numbytes=0 ;
 +     }
       while ( (pkt = q->head) != NULL ) {
   int len = pkt->dn_m->m_pkthdr.len;
   int len_scaled = p->bandwidth ? len*8*hz : 0 ;
 ***************
 *** 1515,1521 ****
   static int
   config_pipe(struct dn_pipe *p)
   {
 !     int s ;
       struct dn_flow_set *pfs = &(p->fs);
  
       /*
 --- 1520,1526 ----
   static int
   config_pipe(struct dn_pipe *p)
   {
 !     int s = 0;
       struct dn_flow_set *pfs = &(p->fs);
  
       /*
 ***************
 *** 1549,1561 ****
       x->idle_heap.size = x->idle_heap.elements = 0 ;
       x->idle_heap.offset=OFFSET_OF(struct dn_flow_queue, heap_pos);
   } else
       x = b;
  
 !     x->bandwidth = p->bandwidth ;
   x->numbytes = 0; /* just in case... */
   bcopy(p->if_name, x->if_name, sizeof(p->if_name) );
   x->ifp = NULL ; /* reset interface ptr */
 !     x->delay = p->delay ;
   set_fs_parms(&(x->fs), pfs);
  
  
 --- 1554,1579 ----
       x->idle_heap.size = x->idle_heap.elements = 0 ;
       x->idle_heap.offset=OFFSET_OF(struct dn_flow_queue, heap_pos);
   } else
 + {   struct dn_flow_queue *q;
 +     int i;
 +
       x = b;
 +     s = splimp(); /* protect mods to active pipe/flow set */
 +
 +     /* Obtained from kern/37573 Audit-Trail    */
 +     /* flush accumulated credit for all queues */
 +     for (i = 0 ; i <= x->fs.rq_size ; i++ )
 + for (q = x->fs.rq[i] ; q ; q = q->next ) {
 +     q->numbytes = 0;
 + }
 +     }
 +       
  
 ! x->bandwidth = p->bandwidth ;
   x->numbytes = 0; /* just in case... */
   bcopy(p->if_name, x->if_name, sizeof(p->if_name) );
   x->ifp = NULL ; /* reset interface ptr */
 ! x->delay = p->delay ;
   set_fs_parms(&(x->fs), pfs);
  
  
 ***************
 *** 1571,1578 ****
   all_pipes = x ;
       else
   a->next = x ;
 -     splx(s);
   }
       } else { /* config queue */
   struct dn_flow_set *x, *a, *b ;
  
 --- 1589,1596 ----
   all_pipes = x ;
       else
   a->next = x ;
   }
 + splx(s);
       } else { /* config queue */
   struct dn_flow_set *x, *a, *b ;
  
 ***************
 *** 1600,1605 ****
 --- 1618,1624 ----
       if (pfs->parent_nr != 0 && b->parent_nr != pfs->parent_nr)
   return EINVAL ;
       x = b;
 +     s = splimp(); /* protect mods to active pipe/flow set */
   }
   set_fs_parms(x, pfs);
  
 ***************
 *** 1615,1622 ****
   all_flow_sets = x;
       else
   a->next = x;
 -     splx(s);
   }
       }
       return 0 ;
   }
 --- 1634,1641 ----
   all_flow_sets = x;
       else
   a->next = x;
   }
 + splx(0);
       }
       return 0 ;
   }
 ====================
 
 
 
 -- 
 Pawe Maachowski
State-Changed-From-To: open->closed 
State-Changed-By: maxim 
State-Changed-When: Fri Mar 28 03:52:51 PST 2003 
State-Changed-Why:  
I believe your problem report is a duplicate of bin/37573. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=37573 

I have already commit a fix for it in -CURRENT and will 
going to MFC to -STABLE in six weeks. 


Responsible-Changed-From-To: freebsd-bugs->maxim 
Responsible-Changed-By: maxim 
Responsible-Changed-When: Fri Mar 28 03:52:51 PST 2003 
Responsible-Changed-Why:  
Feedbacks trap. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=48009 
>Unformatted:
