From nobody@FreeBSD.org  Thu Feb  1 23:39:09 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A702216A401
	for <freebsd-gnats-submit@FreeBSD.org>; Thu,  1 Feb 2007 23:39:09 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id 9CE3813C441
	for <freebsd-gnats-submit@FreeBSD.org>; Thu,  1 Feb 2007 23:39:09 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l11Nd9BF049933
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 1 Feb 2007 23:39:09 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id l11Nd9nf049932;
	Thu, 1 Feb 2007 23:39:09 GMT
	(envelope-from nobody)
Message-Id: <200702012339.l11Nd9nf049932@www.freebsd.org>
Date: Thu, 1 Feb 2007 23:39:09 GMT
From: dave baukus<david.baukus@us.fujitsu.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: TCP connection ETIMEDOUT
X-Send-Pr-Version: www-3.0

>Number:         108670
>Category:       kern
>Synopsis:       [tcp] TCP connection ETIMEDOUT
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    silby
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 01 23:40:18 GMT 2007
>Closed-Date:    
>Last-Modified:  Sun Aug 18 12:20:00 UTC 2013
>Originator:     dave baukus
>Release:        FreeBSD6.1
>Organization:
FNC
>Environment:
FreeBSD krakatoa 6.1-RELEASE FreeBSD 6.1-RELEASE #23: Wed Aug  9 13:33:37 CDT 2006     dbaukus@krakatoa:/home/dbaukus/kern/i386/compile/KRAKATOA-FNC  i386

>Description:
There is a bug  tcp_output() for at least FreeBSD 6.1
that causes a perfectly good TCP to be dropped by its
retransmit timer; the application receives ETIMEDOUT.

Consider a TCP that never transmits (the receive end of the ttcp
utility is an example), while the TCP is established
snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit
timer should never be started. If the retransmit timer is started
then it is never stopped by tcp_input/tcp_out because
snd_max == snd_una == snd_nxt (always). Once started the
timer continues its count up till tp->t_rxtshift == 12 and
the connection that never transmitted gets falsely killed.

The bug is to blindly rely on the return value of ip_output().
If ip_output() returns ENOBUFS then the retransmit timer is
activated:

From the end of tcp_output():
out:
SOCKBUF_UNLOCK_ASSERT(&so->so_snd);    /* Check gotos. */
if (error == ENOBUFS) {
        if (!callout_active(tp->tt_rexmt) &&
            !callout_active(tp->tt_persist))
                     callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                         tcp_timer_rexmt, tp);
                     tp->snd_cwnd = tp->t_maxseg;
                     return (0);
}

My simple minded fix would be not to start the retransmit timer;
if tcp_output() wanted to time this transmit it would have started
the timer up above.

This ETIMEDOUT problem is easily recreated on any old machine
using a single slow ethernet device and the ttcp test utility.
First, fire up a couple ttcp receivers. Second, flood the same
interface with enough ttcp transmitters to cause the driver's transmit
ring and interface queue to back up. Eventually, one of the ttcp
receives will get ENOBUFS from ip_output() and the retransmit
timer will be wrongly activated for a pure ACK segment.

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r 
>How-To-Repeat:

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r 
>Fix:
Do not start the retransmit timer based on error codes from ip_output() ?
>Release-Note:
>Audit-Trail:

From: Dave Baukus <david.baukus@us.fujitsu.com>
To: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Cc:  
Subject: Re: kern/108670: TCP connection ETIMEDOUT
Date: Fri, 02 Feb 2007 09:30:14 -0600

 I realized, late last night, that I was wrong on a few
 details concerning this bug:
 
 1.) The retransmit timer does not keep popping on without
 being restarted.
 
 2.) ip_output() must return ENOBUFS (TCP_MAXRXTSHIFT + 1) times
 to the same, non-transmitting TCP.
 
 3.) Given a TCP as described below, when tcp_output() uses ENOBUFS
 to blindly start the retransmit timer then tp->t_rxtshift will be
 falsely incremented and never cleared.
 
 Thus the bug manifests itself because it appears for a TCP that
 never transmits nobody ever clears clears tp->t_rxtshift;
 this allows tp->t_rxtshift to slowly count up to TCP_MAXRXTSHIFT;
 once TCP_MAXRXTSHIFT is exceeded tcp_timer_rexmt() will
 kill the poor innocent TCP.
 
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Apr 24 02:50:19 UTC 2007 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108670 
Responsible-Changed-From-To: freebsd-net->andre 
Responsible-Changed-By: andre 
Responsible-Changed-When: Sun May 13 18:38:09 UTC 2007 
Responsible-Changed-Why:  
Take over. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108670 
Responsible-Changed-From-To: andre->silby 
Responsible-Changed-By: kmacy 
Responsible-Changed-When: Thu Nov 15 23:24:05 UTC 2007 
Responsible-Changed-Why:  

silby - this sounds like a bug you fixed not too long ago - please take a look 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108670 

From: Till Toenges <tt@kyon.de>
To: bug-followup@FreeBSD.org, david.baukus@us.fujitsu.com
Cc:  
Subject: Re: kern/108670: [tcp] TCP connection ETIMEDOUT
Date: Sun, 18 Aug 2013 14:10:06 +0200

 I think i've been hit by this. I played around with a more current 
 version of tcp_output.c (FreeBSD 9.1) and believe the described 
 behaviour still exists. But without knowing too much about TCP and the 
 FreeBSD kernel, I cannot create a real patch that won't break other 
 cases. Is anybody still working on this since 2007?
 
 Till
>Unformatted:
Is this still a problem?
