From nobody@FreeBSD.org  Sat Apr 10 11:05:33 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B9D72106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Apr 2010 11:05:33 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id A832E8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Apr 2010 11:05:33 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o3AB5WRb008860
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Apr 2010 11:05:32 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o3AB5Wnc008859;
	Sat, 10 Apr 2010 11:05:32 GMT
	(envelope-from nobody)
Message-Id: <201004101105.o3AB5Wnc008859@www.freebsd.org>
Date: Sat, 10 Apr 2010 11:05:32 GMT
From: Richard Scheffenegger <rs@netapp.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: TCP/ECN behaves different to CE/CWR than ns2 reference
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         145600
>Category:       kern
>Synopsis:       TCP/ECN behaves different to CE/CWR than ns2 reference
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-net
>State:          patched
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Apr 10 11:10:02 UTC 2010
>Closed-Date:    
>Last-Modified:  Wed Oct 12 14:50:47 UTC 2011
>Originator:     Richard Scheffenegger
>Release:        8.0-RC2
>Organization:
NetApp
>Environment:
FreeBSD rsFreeBSD.vie.demo 8.0-RC2 FreeBSD 8.0-RC2 #16: Sat Nov 14 22:25:28 CET 2009 root@rsFreeBSD.vie.demo:/usr/obj/usr/src/sys/GENERIC i386
>Description:
Used TBIT (www.icir.org/tbit/) to verify the RFC3168 compliance of the FreeBSD
ECN implementiation.

First, the stock IP stack filers out the CE and ECT(1) codepoints - even when TCP negotiates successfully for ECN.

This can be seen by "netstat -sp tcp":

>How-To-Repeat:
Run NewECN test of TBIT against an ECN activated freebsd stack (with deactivated CE/ECT(1) filtering in the IP stack).

SImilar, injecting a CWR/CE data segment into an ECN-enabled TCP stream can demonstrate the problem.
>Fix:
rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c
--- netinet.orig/tcp_input.c    2009-10-25 02:10:29.000000000 +0100
+++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200
@@ -1127,36 +1127,37 @@
        /*
         * Unscale the window into a 32-bit value.
         * For the SYN_SENT state the scale is zero.
         */
        tiwin = th->th_win << tp->snd_scale;

        /*
         * TCP ECN processing.
         */
        if (tp->t_flags & TF_ECN_PERMIT) {
+
+               if (thflags & TH_CWR)
+                        tp->t_flags &= ~TF_ECN_SND_ECE;
+
                switch (iptos & IPTOS_ECN_MASK) {
                case IPTOS_ECN_CE:
                        tp->t_flags |= TF_ECN_SND_ECE;
                        TCPSTAT_INC(tcps_ecn_ce);
                        break;
                case IPTOS_ECN_ECT0:
                        TCPSTAT_INC(tcps_ecn_ect0);
                        break;
                case IPTOS_ECN_ECT1:
                        TCPSTAT_INC(tcps_ecn_ect1);
                        break;
                }

-               if (thflags & TH_CWR)
-                       tp->t_flags &= ~TF_ECN_SND_ECE;
-
                /*
                 * Congestion experienced.
                 * Ignore if we are already trying to recover.
                 */
                if ((thflags & TH_ECE) &&
                    SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                        TCPSTAT_INC(tcps_ecn_rcwnd);
                        tcp_congestion_exp(tp);
                }
        }


Patch attached with submission follows:

rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c
--- netinet.orig/tcp_input.c    2009-10-25 02:10:29.000000000 +0100
+++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200
@@ -1127,36 +1127,37 @@
        /*
         * Unscale the window into a 32-bit value.
         * For the SYN_SENT state the scale is zero.
         */
        tiwin = th->th_win << tp->snd_scale;

        /*
         * TCP ECN processing.
         */
        if (tp->t_flags & TF_ECN_PERMIT) {
+
+               if (thflags & TH_CWR)
+                        tp->t_flags &= ~TF_ECN_SND_ECE;
+
                switch (iptos & IPTOS_ECN_MASK) {
                case IPTOS_ECN_CE:
                        tp->t_flags |= TF_ECN_SND_ECE;
                        TCPSTAT_INC(tcps_ecn_ce);
                        break;
                case IPTOS_ECN_ECT0:
                        TCPSTAT_INC(tcps_ecn_ect0);
                        break;
                case IPTOS_ECN_ECT1:
                        TCPSTAT_INC(tcps_ecn_ect1);
                        break;
                }

-               if (thflags & TH_CWR)
-                       tp->t_flags &= ~TF_ECN_SND_ECE;
-
                /*
                 * Congestion experienced.
                 * Ignore if we are already trying to recover.
                 */
                if ((thflags & TH_ECE) &&
                    SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                        TCPSTAT_INC(tcps_ecn_rcwnd);
                        tcp_congestion_exp(tp);
                }
        }


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->patched 
State-Changed-By: rpaulo 
State-Changed-When: Sat Apr 10 12:47:30 UTC 2010 
State-Changed-Why:  
Fixed in HEAD. MFC pending. Thanks! 


Responsible-Changed-From-To: freebsd-bugs->rpaulo 
Responsible-Changed-By: rpaulo 
Responsible-Changed-When: Sat Apr 10 12:47:30 UTC 2010 
Responsible-Changed-Why:  
Fixed in HEAD. MFC pending. Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=145600 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/145600: commit references a PR
Date: Sat, 10 Apr 2010 12:47:21 +0000 (UTC)

 Author: rpaulo
 Date: Sat Apr 10 12:47:06 2010
 New Revision: 206456
 URL: http://svn.freebsd.org/changeset/base/206456
 
 Log:
   Honor the CE bit even when the CWR bit is set.
   
   PR:		145600
   Submitted by:	Richard Scheffenegger <rs at netapp.com>
   MFC after:	1 week
 
 Modified:
   head/sys/netinet/tcp_input.c
 
 Modified: head/sys/netinet/tcp_input.c
 ==============================================================================
 --- head/sys/netinet/tcp_input.c	Sat Apr 10 12:29:09 2010	(r206455)
 +++ head/sys/netinet/tcp_input.c	Sat Apr 10 12:47:06 2010	(r206456)
 @@ -1134,6 +1134,8 @@ tcp_do_segment(struct mbuf *m, struct tc
  	 * TCP ECN processing.
  	 */
  	if (tp->t_flags & TF_ECN_PERMIT) {
 +		if (thflags & TH_CWR)
 +			tp->t_flags &= ~TF_ECN_SND_ECE;
  		switch (iptos & IPTOS_ECN_MASK) {
  		case IPTOS_ECN_CE:
  			tp->t_flags |= TF_ECN_SND_ECE;
 @@ -1146,10 +1148,6 @@ tcp_do_segment(struct mbuf *m, struct tc
  			TCPSTAT_INC(tcps_ecn_ect1);
  			break;
  		}
 -
 -		if (thflags & TH_CWR)
 -			tp->t_flags &= ~TF_ECN_SND_ECE;
 -
  		/*
  		 * Congestion experienced.
  		 * Ignore if we are already trying to recover.
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/145600: commit references a PR
Date: Sat, 17 Apr 2010 17:40:27 +0000 (UTC)

 Author: rpaulo
 Date: Sat Apr 17 17:40:12 2010
 New Revision: 206762
 URL: http://svn.freebsd.org/changeset/base/206762
 
 Log:
   MFC r206456:
    Honor the CE bit even when the CWR bit is set.
   
    PR:		145600
    Submitted by:	Richard Scheffenegger <rs at netapp.com>
 
 Modified:
   stable/8/sys/netinet/tcp_input.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/netinet/tcp_input.c
 ==============================================================================
 --- stable/8/sys/netinet/tcp_input.c	Sat Apr 17 17:02:17 2010	(r206761)
 +++ stable/8/sys/netinet/tcp_input.c	Sat Apr 17 17:40:12 2010	(r206762)
 @@ -1134,6 +1134,8 @@ tcp_do_segment(struct mbuf *m, struct tc
  	 * TCP ECN processing.
  	 */
  	if (tp->t_flags & TF_ECN_PERMIT) {
 +		if (thflags & TH_CWR)
 +			tp->t_flags &= ~TF_ECN_SND_ECE;
  		switch (iptos & IPTOS_ECN_MASK) {
  		case IPTOS_ECN_CE:
  			tp->t_flags |= TF_ECN_SND_ECE;
 @@ -1146,10 +1148,6 @@ tcp_do_segment(struct mbuf *m, struct tc
  			TCPSTAT_INC(tcps_ecn_ect1);
  			break;
  		}
 -
 -		if (thflags & TH_CWR)
 -			tp->t_flags &= ~TF_ECN_SND_ECE;
 -
  		/*
  		 * Congestion experienced.
  		 * Ignore if we are already trying to recover.
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
Responsible-Changed-From-To: rpaulo->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Dec 4 16:18:13 UTC 2010 
Responsible-Changed-Why:  
rpaulo has return his commit bit for safekeeing. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=145600 
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: remko 
Responsible-Changed-When: Wed Oct 12 14:50:09 UTC 2011 
Responsible-Changed-Why:  
Reassign to networking team. Network people, this already had been 
committed and might be interesting to commit to 7-stable as well. 
If not please close the ticket. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=145600 
>Unformatted:
 >>      0 packets with ECN CE bit set
         154704 packets with ECN ECT(0) bit set
 >>      0 packets with ECN ECT(1) bit set
         7 successful ECN handshakes
         0 times ECN reduced the congestion window
 
 these two lines will stay zero even when a CE / ECT(1) frame is deliberately sent (as is the case in the TBIT NewECN test).
 
 After activating PF (with red scheduling), these two codepoints get delivered by IP to TCP.
 
 Further testing revealed, that another test case - the behavior when a CE/CWD segment is received, deviates from the reference implementation in ns2 by the original authors of RFC3168.
 
 Reference implementation (from tcp-sink.cc in ns2):
 
 
 	if ( (sf != 0 && sf->cong_action()) || of->cong_action() ) 
 		// Sender has responsed to congestion. 
 		acker_->update_ecn_unacked(0);
 	if ( (sf != 0 && sf->ect() && sf->ce())  || 
 			(of->ect() && of->ce()) )
 		// New report of congestion.  
 		acker_->update_ecn_unacked(1);
 	if ( (sf != 0 && sf->ect()) || of->ect() )
 		// Set EcnEcho bit.  
 		nf->ecnecho() = acker_->ecn_unacked();
 
 Basically, CWR is checked first, and ECE cleared; if that segment also contains the CE codepoint again, ECE is set anew.
 
 Implementation in tcp_input.c:
 
         /*
          * TCP ECN processing.
          */
         if (tp->t_flags & TF_ECN_PERMIT) {
                 switch (iptos & IPTOS_ECN_MASK) {
                 case IPTOS_ECN_CE:
                         tp->t_flags |= TF_ECN_SND_ECE;
                         TCPSTAT_INC(tcps_ecn_ce);
                         break;
                 case IPTOS_ECN_ECT0:
                         TCPSTAT_INC(tcps_ecn_ect0);
                         break;
                 case IPTOS_ECN_ECT1:
                         TCPSTAT_INC(tcps_ecn_ect1);
                         break;
                 }
 
                 if (thflags & TH_CWR)
                         tp->t_flags &= ~TF_ECN_SND_ECE;
  
                 /*
                  * Congestion experienced.
                  * Ignore if we are already trying to recover.
                  */
                 if ((thflags & TH_ECE) &&
                     SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                         TCPSTAT_INC(tcps_ecn_rcwnd);
                         tcp_congestion_exp(tp);
                 }
         }
 
 Here, CE is checked first, and CWR later - resulting in a normal ACK returned for a CE/CWR, instead of a ECE/ACK.
 
 
 This issue is problematic, as it precludes further (currently investigated) enhancements of ECN senders to react more informed about ECNs.
