From kml@roller.nas.nasa.gov  Thu Mar 19 14:59:30 1998
Received: from roller.nas.nasa.gov (roller.nas.nasa.gov [129.99.223.26])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA21946
          for <FreeBSD-gnats-submit@freebsd.org>; Thu, 19 Mar 1998 14:59:29 -0800 (PST)
          (envelope-from kml@roller.nas.nasa.gov)
Received: (from kml@localhost)
	by roller.nas.nasa.gov (8.8.7/8.8.7) id OAA00289;
	Thu, 19 Mar 1998 14:59:26 -0800 (PST)
	(envelope-from kml)
Message-Id: <199803192259.OAA00289@roller.nas.nasa.gov>
Date: Thu, 19 Mar 1998 14:59:26 -0800 (PST)
From: Kevin Lahey <kml@roller.nas.nasa.gov>
Reply-To: kml@roller.nas.nasa.gov
To: FreeBSD-gnats-submit@freebsd.org
Subject: TCP retransmission bug
X-Send-Pr-Version: 3.2

>Number:         6068
>Category:       i386
>Synopsis:       TCP can time out of retransmission in 12 seconds
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    davidg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 19 15:00:02 PST 1998
>Closed-Date:    Fri Apr 24 02:25:41 PDT 1998
>Last-Modified:  Fri Apr 24 02:26:43 PDT 1998
>Originator:     Kevin Lahey
>Release:        FreeBSD 2.2.5-RELEASE i386
>Organization:
NASA/Ames
>Environment:

This is a fresh, patch-free installation of 2.2.5.

>Description:

In some circumstances, when the round-trip time is very low, it
is possible for TCP to time out in just 12 seconds, after sending
12 packets:

14:49:02.585137 roller.1026 > yakko-work.discard: . 2195169:2196617(1448) ack 1 win 17376 <nop,nop,timestamp 403 8642> (DF) (ttl 64, id 4931)
14:49:02.586423 roller.1026 > yakko-work.discard: . 2196617:2198065(1448) ack 1 win 17376 <nop,nop,timestamp 403 8642> (DF) (ttl 64, id 4932)
14:49:02.587676 roller.1026 > yakko-work.discard: P 2198065:2199513(1448) ack 1 win 17376 <nop,nop,timestamp 403 8642> (DF) (ttl 64, id 4933)
14:49:04.202214 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 406 8642> (DF) (ttl 64, id 5166)
14:49:05.202248 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 408 8642> (DF) (ttl 64, id 5167)
14:49:06.202232 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 410 8642> (DF) (ttl 64, id 5168)
14:49:07.202232 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 412 8642> (DF) (ttl 64, id 5169)
14:49:08.202225 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 414 8642> (DF) (ttl 64, id 5170)
14:49:09.202228 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 416 8642> (DF) (ttl 64, id 5171)
14:49:10.202253 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 418 8642> (DF) (ttl 64, id 5172)
14:49:11.202221 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 420 8642> (DF) (ttl 64, id 5173)
14:49:12.202203 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 422 8642> (DF) (ttl 64, id 5174)
14:49:13.202221 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 424 8642> (DF) (ttl 64, id 5175)
14:49:14.202212 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 426 8642> (DF) (ttl 64, id 5176)
14:49:15.202257 roller.1026 > yakko-work.discard: . 2518073:2519521(1448) ack 1 win 17376 <nop,nop,timestamp 428 8642> (DF) (ttl 64, id 5177)
14:49:16.201046 roller.1026 > yakko-work.discard: R 2535449:2535449(0) ack 1 win 17376 (DF) (ttl 64, id 5178)

I just fixed this is NetBSD, and it looks like the problem is that the
TCP_REXMTVAL can be 0 with the Brakmo-Peterson RTO estimator,
when with the Van Jacobson RTO estimator the lowest value it could
return was 3.  When the value is 0, the exponential backoff product is
also 0, and so the timeout falls back to the minimum.  After 12 
retransmissions, it just times out.

It looks like the persist code in tcp_timer.c has a fix for just
this problem, but the fix wasn't applied to the retransmission code...

>How-To-Repeat:

From the FreeBSD host:

ttcp -t -s -p9 target

Unplug the target from the net and watch to see how long the 
connection takes to timeout.  I found that this didn't fail
every time, but was pretty repeatable.

>Fix:

Apply some sort of check to TCP_REXMTVAL to ensure that it is
at least t_rttmin before multiplying it by the exponential
backoff term, as is currently done for the persist timer.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->wollman 
Responsible-Changed-By: phk 
Responsible-Changed-When: Sat Apr 11 14:47:31 PDT 1998 
Responsible-Changed-Why:  
care to analyse this one ? 
State-Changed-From-To: open->analyzed 
State-Changed-By: phk 
State-Changed-When: Sun Apr 12 01:00:51 PDT 1998 
State-Changed-Why:  
Davidg is on this one 


Responsible-Changed-From-To: wollman->davidg 
Responsible-Changed-By: phk 
Responsible-Changed-When: Sun Apr 12 01:00:51 PDT 1998 
Responsible-Changed-Why:  
because davidg is on it :-) 
State-Changed-From-To: analyzed->closed 
State-Changed-By: dg 
State-Changed-When: Fri Apr 24 02:25:41 PDT 1998 
State-Changed-Why:  
I fixed this in rev 1.43 of tcp_var.h. Thanks for the bug report. 
>Unformatted:
