From sec@42.org  Sat Jan 15 01:33:37 2011
Return-Path: <sec@42.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E1C6A106564A
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Jan 2011 01:33:37 +0000 (UTC)
	(envelope-from sec@42.org)
Received: from ice.42.org (v6.42.org [IPv6:2001:608:9::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 9CDC98FC08
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Jan 2011 01:33:37 +0000 (UTC)
Received: by ice.42.org (Postfix, from userid 1000)
	id A314E2845B; Sat, 15 Jan 2011 02:33:36 +0100 (CET)
Message-Id: <20110115013336.A314E2845B@ice.42.org>
Date: Sat, 15 Jan 2011 02:33:36 +0100 (CET)
From: Stefan `Sec` Zehl <sec@42.org>
Reply-To: Stefan `Sec` Zehl <sec@42.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: tcp "window probe" bug on 64bit
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         154006
>Category:       kern
>Synopsis:       [tcp] [patch] tcp "window probe" bug on 64bit
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    jhb
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 15 01:40:07 UTC 2011
>Closed-Date:    Thu Jul 07 18:52:16 UTC 2011
>Last-Modified:  Thu Jul 07 18:52:16 UTC 2011
>Originator:     Stefan `Sec` Zehl
>Release:        FreeBSD 8.1-STABLE amd64
>Organization:
>Environment:
System: FreeBSD ice 8.1-STABLE FreeBSD 8.1-STABLE #15: Mon Oct 25 12:20:38 CEST 2010 root@ice:/usr/obj/usr/src/sys/ICE amd64

As far as I can tell, the offending code is in all FreeBSD versions, not just
8-STABLE

	
>Description:

On amd64 the PERSIST timer does not get started (and consecquently executed)
for tcp connections stalled on a 0-size receive window. This means that no
single-byte probe packet is sent, so connections might hang indefinitely.

This is due to a missing (long) conversion in tcp_output.c around line 562
where "adv" is calculated. 

After this patch, amd64 behaves the same way as i386 again.


>How-To-Repeat:

connect to a certain broken host which advertises window size 0 in the
SYN|ACK handshake packet, but increases window size after the 3-way
handshake

>Fix:

--- src/sys/netinet/tcp_output.c	2010-09-20 17:49:17.000000000 +0200
+++ src/sys/netinet/tcp_output.c	2011-01-14 19:30:46.000000000 +0100
@@ -571,7 +559,7 @@
 		 * TCP_MAXWIN << tp->rcv_scale.
 		 */
 		long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
-			(tp->rcv_adv - tp->rcv_nxt);
+			(long) (tp->rcv_adv - tp->rcv_nxt);
 
 		if (adv >= (long) (2 * tp->t_maxseg))
 			goto send;


>Release-Note:
>Audit-Trail:

From: Bruce Evans <brde@optusnet.com.au>
To: Stefan `Sec` Zehl <sec@42.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
Subject: Re: kern/154006: tcp "window probe" bug on 64bit
Date: Sat, 15 Jan 2011 16:31:10 +1100 (EST)

 On Sat, 15 Jan 2011, Stefan `Sec` Zehl wrote:
 
 >> Description:
 >
 > On amd64 the PERSIST timer does not get started (and consecquently executed)
 > for tcp connections stalled on a 0-size receive window. This means that no
 > single-byte probe packet is sent, so connections might hang indefinitely.
 >
 > This is due to a missing (long) conversion in tcp_output.c around line 562
 > where "adv" is calculated.
 >
 > After this patch, amd64 behaves the same way as i386 again.
 
 >> Fix:
 >
 > --- src/sys/netinet/tcp_output.c	2010-09-20 17:49:17.000000000 +0200
 > +++ src/sys/netinet/tcp_output.c	2011-01-14 19:30:46.000000000 +0100
 > @@ -571,7 +559,7 @@
 > 		 * TCP_MAXWIN << tp->rcv_scale.
 > 		 */
 > 		long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
 > -			(tp->rcv_adv - tp->rcv_nxt);
 > +			(long) (tp->rcv_adv - tp->rcv_nxt);
 >
 > 		if (adv >= (long) (2 * tp->t_maxseg))
 > 			goto send;
 >
 
 Many other type errors are visible in this patch:
 - min() takes 'unsigned int' args, but is passed 'signed long' args:
    - recwin has type long.  This is smaller )same size but smaller max)
      than 'unsigned int' on 32-bit arches, and larger on 64-bit arches
    - TCP_MAXWIN has type int (except on 16-bit arches, which are not
      supported and are no longer permitted by POSIX).  Then we explicitly
      make its type incompatible with min() by casting to long.  The 16-bit
      arches don't matter, except they are responsible for many of the type
      errors here.  recvwin is long and TCP_WIN is cast to long since plain
      int was not long enough on 16-bit arches.
    Hopefully both of min()'s parameters are non-negative and <= UINT_MAX.
    Then nothing bad happens when min() converts them to u_int.  The result
    of min() has type u_int.
 - rcv_adv has type tcp_seq.  Seems correct
 - tcp_seq has type u_int32_t.  Seems correct, except for its old spelling.
    The spelling is not so old that it is u_long (to support the 16-bit arches),
    but it hasn't caught up with C99 yet.
 - rcv_next has type u_int32_t.  Seems logically incorrect -- should be tcp_seq.
 - (tp->rcv_adv - tp->rcv_nxt) has type [ the default promotion of { tcp_seq,
    u_int32_t } ].  This is u_int on all supported arches.  Apparently, the
    value of this should always be positive, since the cast doesn't change
    this on 64-bit arches.  However, the cast might break this on 32-bit
    arches (it breaks the value whenever it exceeds 0x80000000, if that can
    happen, since longs are smaller than u_int's on 32-bit arches.
 - the type of the expression for the rvalue is [ the default promotion of
    { u_int, u_int } ] in the old version, and the same with the last u_int
    replaced by long in the patched version.  It is most natural to subtract
    u_int's here, like the old version did -- everything in sight is (except
    for all the type errors) a sequence number or a difference of sequence
    numbers; the differences are always taken mod 2**32 and are non-negative,
    but must be careful if the difference should really be negative.  The
    SEQ_LT() family of macros can be used to determine if differences should
    be negative (this family is further towards losing 16-bitness -- it casts
    to int instead of to long).  Unfortunately there is no SEQ_DIFF() macro
    to simplify easy cases of taking differences.  I think there are scattered
    casts for this as here.
 
 So casting to long is not good.  It gives another type error to analyse,
 and works accidentally.
 
 Futher analysis: without the patch:
 
  		long adv = x - y;
 
 where x has type u_int and y had type u_int.  The difference always has
 type u_int; if x is sequentially less than y, then the difference should
 be negative, but its type forces it to be positive.  We should use
 SEQ_FOO() if this is possible, or we can use delicate conversions if we
 do only 2 pages of analysis per line to justify the delicacies (not too
 bad if there is a macro for this).
 
 - On 32-bit arches, long is smaller than u_int, so the assignment overflows
    if the difference should have been negative.  The behaviour is undefined,
    but on normal 2's complement arches, it is benign and fixes up the sign
    error.
 
 - On 64-bit arches, long is larger than u_int, so the difference remains
    nonnegative when it should have been negative, and is normally huge
    (something like 0U - 1U = 0xFFFFFFFF).  The huge value is near UINT_MAX.
    LONG_MAX is much larger, so the assignment doesn't overflow and the
    value remains near UINT_MAX.
 
 With the patch:
 
  		long adv = x - (long)y;
 
 where x has type u_int and (long)y had type long:
 
 - On 32-bit arches, long is smaller than u_int, so (long)y may overflow;
    overflow gives undefined behaviour which happens to be benign.  Then
    the binary promotions apply.  Although I have been describing long as
    being smaller than u_int on 32-bit arches, in the C type system it is
    logically larger, so the binary promotions promote x to long too, and
    leave (long)y unchanged.  "Promotion" of x is really demotion, so it
    may overflow beningly just like for y.  I think the difference doesn't
    overflow, and even if it does then the result is the same as before,
    since everything will be done in 32-bit registers using the same code
    as before.
 
 - On 64-bit arches: long is larger than u_int, so (long)y doesn't change
    the value of y.  The binary promotions then promote x to long without
    changing its value, and don't change (long)y's type or value.  Both
    terms remain nonnegative.  (long)y can still be garbage -- something
    like 0xFFFFFFFF when it should be -1.  I think this causes problems,
    but much smaller than before.  Oops, the above may be wrong about y possibly
    wanting to be negative.  Things are not quite as complicated if this
    sequence cannot occur:
    - if this can occur, then (x - (long)y) is a large negative number when
      it should be a small positive number (not much larger than x).  This
      doesn't seem to be what causes the main problem.
    - the main problem is just when x < y.  Then (x - y) gives a huge
      unsigned int value (which bogusly assigning to a long doesn't fix
      up for the 64-bit case).  But (x - (long)y) gives a negative value
      when x < y, without additional type errors or overflows on either
      32-bit or 64-bit arches provideded x and y are not very large.
 
 Better fixes:
 
 (A) explicitly convert to int instead implicitly converting to long:
 
  		long adv = (int)
  		    min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
  		    (tp->rcv_adv - tp->rcv_nxt);
 
 or more complete fixes for type errors (beware of things needing to remaining
 bogusly long):
 
  		/* Also change recwin to int32_t. */
  		int adv = imin(recwin, TCP_MAXWIN << tp->rcv_scale) -
  		    (int)(tp->rcv_adv - tp->rcv_nxt);
 
 This doesn't fix some style bugs:
 - nested declaration.
 - initialization in declaration
 
 tcp code already uses scattered conversions like this a bit too much.  E.g.,
 in tcp_input.c, there is one imax() very like the above imin().  This seems
 to be the only one involving the window, however; it initializes `win'
 which already has type int, but some other window variables have type
 u_int...
 
 Later code in tcp_output uses bogus casts to long and larger code instead:
 
 % 	if (recwin < (long)(tp->rcv_adv - tp->rcv_nxt))
 % 		recwin = (long)(tp->rcv_adv - tp->rcv_nxt);
 % 	if (recwin > (long)TCP_MAXWIN << tp->rcv_scale)
 % 		recwin = (long)TCP_MAXWIN << tp->rcv_scale;
 % 	...
 % 	if (recwin > 0 && SEQ_GT(tp->rcv_nxt + recwin, tp->rcv_adv))
 % 		tp->rcv_adv = tp->rcv_nxt + recwin;
 
 Note that the first statement avoids using the technically incorrect
 SEQ_FOO() although its internals are better (cast to int instead of
 long).  It uses cases essentially like yours.  Then further analysis
 is simpler because everything is converted to long.  The second starement
 is similar to the first half of the broken expression.  Large code using
 if's and else's and tests (x >= y) before subtracting y from x is much
 easier to get right than 1 complicated 1-statement expression like the
 broken one.  It takes these (x >= y) tests to make code with mixed types
 obviously correct.  But I prefer small fast code with ints for everything,
 since type analyis is too hard.
 
 (B) Use SEQ_FOO().  This can be used for the difference of the sequence
 numbers, but using it on the final difference is not quite right since
 neither x nor y is a sequence number.  In practice SEQ_LT(x, y) will work.
 
 (C) Put (A) or (B) in a macro.  It can depend on benign overflow, or test
 values if necessary.  All this macro is about is subtracting 2 seqence
 values, or possibly differences of and bounds of sequence values, with
 a result that is negative iff that is needed, and a type that is signed
 iff a negative value makes sense or can be handled by the caller (int
 should do for the signed cases, else the type should remain tcp_seq or
 its promotion).  Using ints for tcp_seq is technically invalid since
 they overflow at value INT_MAX.
 
 Bruce

From: Bruce Evans <brde@optusnet.com.au>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Stefan `Sec` Zehl <sec@42.org>, freebsd-bugs@freebsd.org,
        FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/154006: tcp "window probe" bug on 64bit
Date: Sat, 15 Jan 2011 17:05:02 +1100 (EST)

 On Sat, 15 Jan 2011, Bruce Evans wrote:
 
 > ...
 > Later code in tcp_output uses bogus casts to long and larger code instead:
 >
 > % 	if (recwin < (long)(tp->rcv_adv - tp->rcv_nxt))
 > % 		recwin = (long)(tp->rcv_adv - tp->rcv_nxt);
 > % 	if (recwin > (long)TCP_MAXWIN << tp->rcv_scale)
 > % 		recwin = (long)TCP_MAXWIN << tp->rcv_scale;
 > % 	...
 > % 	if (recwin > 0 && SEQ_GT(tp->rcv_nxt + recwin, tp->rcv_adv))
 > % 		tp->rcv_adv = tp->rcv_nxt + recwin;
 >
 > Note that the first statement avoids using the technically incorrect
 > SEQ_FOO() although its internals are better (cast to int instead of
 > long).  It uses cases essentially like yours.  Then further analysis
                    ^^^^^ a cast
 > is simpler because everything is converted to long.  The second starement
 > is similar to the first half of the broken expression.  Large code using
 > if's and else's and tests (x >= y) before subtracting y from x is much
 > easier to get right than 1 complicated 1-statement expression like the
 > broken one.  It takes these (x >= y) tests to make code with mixed types
 > obviously correct.  But I prefer small fast code with ints for everything,
 > since type analyis is too hard.
 
 But the casts to long are not good.  Here they have no effect except
 to break the warning about the bad type of `recwin'.  recwin has type
 long, so assignment to it does the same conversion as the cast, possibly
 with a warning about the implicit conversion if it might overflow (can
 overflow only on 32-bit arches).  The code depends on rcv_adv being
 sequentially >= rcv_next with or without the cast.  Otherwise, the
 difference is huge unsigned int, and the cast only changes this (by
 benign overflow) on 32-bit arches.
 
 This can be fixed by casting to int instead of long (now the cast may have
 an effect, and breaking the warning may be intential), or my proposed
 SEQ_DIFF() macros work well here:
 
  	int recwin, sd;
  	...
  	sd = SEQ_NONNEG_NONLARGE_DIFF(tp->rcv_adv, tp->rcv_nxt);
  	/*
  	 * SEQ_DIFF() supports negative differences;
  	 * SEQ_NONNEG_NONGLARGE_DIFF() KASSERT()s that they don't happen and
  	 * are not too large.  This name is too long.
  	 */
  	if (recwin < sd)
  		recwin = sd;
  	...
 
 Bruce
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Jan 15 08:37:18 UTC 2011 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154006 
State-Changed-From-To: open->patched 
State-Changed-By: jhb 
State-Changed-When: Wed Mar 30 12:38:33 UTC 2011 
State-Changed-Why:  
Fix committed to HEAD. 


Responsible-Changed-From-To: freebsd-net->jhb 
Responsible-Changed-By: jhb 
Responsible-Changed-When: Wed Mar 30 12:38:33 UTC 2011 
Responsible-Changed-Why:  
Fix committed to HEAD. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154006 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/154006: commit references a PR
Date: Wed, 30 Mar 2011 12:35:59 +0000 (UTC)

 Author: jhb
 Date: Wed Mar 30 12:35:39 2011
 New Revision: 220156
 URL: http://svn.freebsd.org/changeset/base/220156
 
 Log:
   Clamp the initial advertised receive window when responding to a SYN/ACK
   to the maximum allowed window.  Growing the window too large would cause
   an underflow in the calculations in tcp_output() to decide if a window
   update should be sent which would prevent the persist timer from being
   started if data was pending and the other end of the connection advertised
   an initial window size of 0.
   
   PR:		kern/154006
   Submitted by:	Stefan `Sec` Zehl  sec 42 org
   Reviewed by:	bz
   MFC after:	1 week
 
 Modified:
   head/sys/netinet/tcp_input.c
 
 Modified: head/sys/netinet/tcp_input.c
 ==============================================================================
 --- head/sys/netinet/tcp_input.c	Wed Mar 30 11:34:40 2011	(r220155)
 +++ head/sys/netinet/tcp_input.c	Wed Mar 30 12:35:39 2011	(r220156)
 @@ -1756,7 +1756,8 @@ tcp_do_segment(struct mbuf *m, struct tc
  				(TF_RCVD_SCALE|TF_REQ_SCALE)) {
  				tp->rcv_scale = tp->request_r_scale;
  			}
 -			tp->rcv_adv += tp->rcv_wnd;
 +			tp->rcv_adv += imin(tp->rcv_wnd,
 +			    TCP_MAXWIN << tp->rcv_scale);
  			tp->snd_una++;		/* SYN is acked */
  			/*
  			 * If there's data, delay ACK; if there's also a FIN
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: jhb 
State-Changed-When: Thu Jul 7 18:51:59 UTC 2011 
State-Changed-Why:  
Fix merged to 7 and 8 back in April. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=154006 
>Unformatted:
