From dillon@apollo.backplane.com  Thu Dec 16 03:33:50 2004
Return-Path: <dillon@apollo.backplane.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 53B9E16A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 16 Dec 2004 03:33:50 +0000 (GMT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DC93843D49
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 16 Dec 2004 03:33:49 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.12.9p2/8.12.9) with ESMTP id iBG3Xi0e000805;
	Wed, 15 Dec 2004 19:33:44 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id iBG3XhPU000804;
	Wed, 15 Dec 2004 19:33:43 -0800 (PST)
	(envelope-from dillon)
Message-Id: <200412160333.iBG3XhPU000804@apollo.backplane.com>
Date: Wed, 15 Dec 2004 19:33:43 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
To: Dan Nelson <dnelson@allantgroup.com>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: [PATCH] Incorrect inflight bandwidth calculation on first packet
References: <200412151827.iBFIRqDB019997@dan.emsphone.com>

>Number:         75140
>Category:       kern
>Synopsis:       Re: [PATCH] Incorrect inflight bandwidth calculation on first packet
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Dec 16 03:40:22 GMT 2004
>Closed-Date:    Thu Dec 16 08:27:52 GMT 2004
>Last-Modified:  Thu Dec 16 08:27:52 GMT 2004
>Originator:     
>Release:        
>Organization:
>Environment:
>Description:
 :>Synopsis:	[PATCH] Incorrect inflight bandwidth calculation on first packet
 :...
 :	
 :>Description:
 :
 :Matt, I'm CC'ing you because it looks like the bug is also in
 :Dragonfly, so I think you'll want to commit something similar.
 :
 :The inflight window scaling algorithm keeps a decaying average of a
 :socket's bandwidth, but on the first call to tcp_xmit_bandwidth_limit,
 :the sequence number of the previous packet is not known, so the
 :(ack_seq - tp->t_bw_rtseq) clause just gives you the raw sequence
 :number, resulting in a random value for the calculated bandwidth.
 :...
 :Until enough packets have traveled so the average has decayed to the
 :correct value, the calculated window is large enough that it's not even
 :used.  On a dialup, for example, it never gets a chance.
 :...
 :The minimal fix is to check for (tp->t_bw_rtttime == 0 || tp->t_bw_rtseq
 :== 0 || (int)bw < 0), and if it evalutes true, store the current values
 :and return.  I recommend committing the whole patch though.  It changed
 :the debugging output to print the instantaneous bw, and the current and
 :previous average bw.
 
     Hmm.  Well, looking at the original code it looks like t_bw_rtttime
     is properly initialized in tcp_newtcpcb(), so it will never be 0.
     t_bw_rtseq is initialized for outgoing connections, but not properly
     initialized for incoming connections.
 
     The original code was rather messy and really needs some reorganization,
     so I would recommend doing some reorganizing rather then just patching
     more mess on top.
 
     * Remove initialization of t_bw_rttime so it can be tested against 0
       in the bandwidth limiting code.
 
     * Auto-initialize t_bw_rtttime in tcp_xmit_bandwidth_limit by testing
       it against 0.
 
     * Add an idle check.
 
     * Assign an initial bandwidth of tcp_inflight_min so high speed
       connections do not have to ramp-up the exponential average all
       the way from 0.
 
     The patch below is against DragonFly and will not apply to FreeBSD, but
     is included to demonstrate my recommendation.
 
 					-Matt
 					Matthew Dillon 
 					<dillon@backplane.com>
 
 Index: tcp_subr.c
 ===================================================================
 RCS file: /cvs/src/sys/netinet/tcp_subr.c,v
 retrieving revision 1.40
 diff -u -r1.40 tcp_subr.c
 --- tcp_subr.c	14 Nov 2004 00:49:08 -0000	1.40
 +++ tcp_subr.c	16 Dec 2004 03:22:55 -0000
 @@ -692,7 +692,6 @@
  	tp->snd_bwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT;
  	tp->snd_ssthresh = TCP_MAXWIN << TCP_MAX_WINSHIFT;
  	tp->t_rcvtime = ticks;
 -	tp->t_bw_rtttime = ticks;
  	/*
  	 * IPv4 TTL initialization is necessary for an IPv6 socket as well,
  	 * because the socket may be bound to an IPv6 wildcard address,
 @@ -1834,6 +1833,7 @@
  	u_long bw;
  	u_long bwnd;
  	int save_ticks;
 +	int delta_ticks;
  
  	/*
  	 * If inflight_enable is disabled in the middle of a tcp connection,
 @@ -1846,26 +1846,38 @@
  	}
  
  	/*
 +	 * Validate the delta time.  If a connection is new or has been idle
 +	 * a long time we have to reset the bandwidth calculator.
 +	 */
 +	save_ticks = ticks;
 +	delta_ticks = save_ticks - tp->t_bw_rtttime;
 +	if (tp->t_bw_rtttime == 0 || delta_ticks < 0 || delta_ticks > hz * 10) {
 +		tp->t_bw_rtttime = ticks;
 +		tp->t_bw_rtseq = ack_seq;
 +		if (tp->snd_bandwidth == 0)
 +			tp->snd_bandwidth = tcp_inflight_min;
 +		return;
 +	}
 +	if (delta_ticks == 0)
 +		return;
 +
 +	/*
 +	 * Sanity check, plus ignore pure window update acks.
 +	 */
 +	if ((int)(ack_seq - tp->t_bw_rtseq) <= 0)
 +		return;
 +
 +	/*
  	 * Figure out the bandwidth.  Due to the tick granularity this
  	 * is a very rough number and it MUST be averaged over a fairly
  	 * long period of time.  XXX we need to take into account a link
  	 * that is not using all available bandwidth, but for now our
  	 * slop will ramp us up if this case occurs and the bandwidth later
  	 * increases.
 -	 *
 -	 * Note: if ticks rollover 'bw' may wind up negative.  We must
 -	 * effectively reset t_bw_rtttime for this case.
  	 */
 -	save_ticks = ticks;
 -	if ((u_int)(save_ticks - tp->t_bw_rtttime) < 1)
 -		return;
 -
 -	bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / 
 -	    (save_ticks - tp->t_bw_rtttime);
 +	bw = (int64_t)(ack_seq - tp->t_bw_rtseq) * hz / delta_ticks;
  	tp->t_bw_rtttime = save_ticks;
  	tp->t_bw_rtseq = ack_seq;
 -	if (tp->t_bw_rtttime == 0 || (int)bw < 0)
 -		return;
  	bw = ((int64_t)tp->snd_bandwidth * 15 + bw) >> 4;
  
  	tp->snd_bandwidth = bw;
 Index: tcp_usrreq.c
 ===================================================================
 RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v
 retrieving revision 1.29
 diff -u -r1.29 tcp_usrreq.c
 --- tcp_usrreq.c	8 Dec 2004 23:59:01 -0000	1.29
 +++ tcp_usrreq.c	16 Dec 2004 03:14:41 -0000
 @@ -873,7 +873,6 @@
  	tp->t_state = TCPS_SYN_SENT;
  	callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp);
  	tp->iss = tcp_new_isn(tp);
 -	tp->t_bw_rtseq = tp->iss;
  	tcp_sendseqinit(tp);
  
  	/*
 @@ -1036,7 +1035,6 @@
  	tp->t_state = TCPS_SYN_SENT;
  	callout_reset(tp->tt_keep, tcp_keepinit, tcp_timer_keep, tp);
  	tp->iss = tcp_new_isn(tp);
 -	tp->t_bw_rtseq = tp->iss;
  	tcp_sendseqinit(tp);
  
  	/*
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: linimon 
State-Changed-When: Thu Dec 16 08:27:09 GMT 2004 
State-Changed-Why:  
Misfiled followup to kern/75122; content migrated. 


Responsible-Changed-From-To: gnats-admin->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Dec 16 08:27:09 GMT 2004 
Responsible-Changed-Why:  

http://www.freebsd.org/cgi/query-pr.cgi?pr=75140 
>Unformatted:
