From delphij@frontfree.net  Sat Jun 19 12:15:04 2004
Return-Path: <delphij@frontfree.net>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8BCEC16A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 19 Jun 2004 12:15:04 +0000 (GMT)
Received: from mail.FreeBSD.org.cn (dns3.freebsd.org.cn [61.129.66.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E7EE243D2D
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 19 Jun 2004 12:15:02 +0000 (GMT)
	(envelope-from delphij@frontfree.net)
Received: (qmail 87095 invoked by uid 0); 19 Jun 2004 12:13:43 -0000
Received: from unknown (HELO beastie.frontfree.net) (218.107.145.7)
  by mail.FreeBSD.org.cn with AES256-SHA encrypted SMTP; 19 Jun 2004 12:13:43 -0000
Received: from localhost (localhost.frontfree.net [127.0.0.1])
	by beastie.frontfree.net (Postfix) with ESMTP id 369B111A26
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 19 Jun 2004 20:13:55 +0800 (CST)
Received: from beastie.frontfree.net ([127.0.0.1])
 by localhost (beastie.frontfree.net [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 01912-10 for <FreeBSD-gnats-submit@freebsd.org>;
 Sat, 19 Jun 2004 20:13:54 +0800 (CST)
Received: by beastie.frontfree.net (Postfix, from userid 1001)
	id 1881611A17; Sat, 19 Jun 2004 20:13:51 +0800 (CST)
Message-Id: <20040619121351.1881611A17@beastie.frontfree.net>
Date: Sat, 19 Jun 2004 20:13:51 +0800 (CST)
From: Xin LI <delphij@FreeBSD.org.cn>
Reply-To: Xin LI <delphij@FreeBSD.org.cn>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [PATCH] RFC 3522 for -HEAD
X-Send-Pr-Version: 3.113
X-GNATS-Notify: delphij@FreeBSD.org

>Number:         68110
>Category:       kern
>Synopsis:       [netinet] [patch] RFC 3522 for -HEAD
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    delphij
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 19 12:20:23 GMT 2004
>Closed-Date:    Thu Apr 12 09:10:42 GMT 2007
>Last-Modified:  Thu Apr 12 09:10:42 GMT 2007
>Originator:     Xin LI
>Release:        FreeBSD 5.2-delphij i386
>Organization:
The FreeBSD Simplified Chinese Project
>Environment:
System: FreeBSD beastie.frontfree.net 5.2-delphij FreeBSD 5.2-delphij #66: Tue Jun 15 11:25:44 CST 2004 root@beastie.frontfree.net:/usr/obj/usr/src/sys/BEASTIE i386


>Description:
	The attached patch brings RFC 3522 (Eifel detection) to FreeBSD.
	The original work was obtained from DragonFlyBSD, which implemented RFC3522 last August. It will be good for FreeBSD to have RFC3522 implementation before 5.3-RELEASE.
>How-To-Repeat:
	N/A
>Fix:
	Apply the attached patchset against HEAD.

	Please be ware that the attached patch will cause an ABI change. I will write an UPDATING entry if this patchset would be accepted.

--- rfc3522.diff begins here ---

Index: src/sys/netinet/tcp_input.c
diff -u src/sys/netinet/tcp_input.c:1.241 src/sys/netinet/tcp_input.c:1.241.1000.1
--- src/sys/netinet/tcp_input.c:1.241	Wed Jun 16 17:35:07 2004
+++ src/sys/netinet/tcp_input.c	Sat Jun 19 17:21:09 2004
@@ -1,4 +1,5 @@
 /*
+ * Copyright (c) 2002-2003 Jeffrey Hsu
  * Copyright (c) 1982, 1986, 1988, 1990, 1993, 1994, 1995
  *	The Regents of the University of California.  All rights reserved.
  *
@@ -131,6 +132,11 @@
     &tcp_do_rfc3390, 0,
     "Enable RFC 3390 (Increasing TCP's Initial Congestion Window)");
 
+static int tcp_do_eifel_detect = 1;
+SYSCTL_INT(_net_inet_tcp, OID_AUTO, eifel, CTLFLAG_RW,
+    &tcp_do_eifel_detect, 0,
+    "Eifel detection algorithm (RFC 3522)");
+
 SYSCTL_NODE(_net_inet_tcp, OID_AUTO, reass, CTLFLAG_RW, 0,
 	    "TCP Segment Reassembly Queue");
 
@@ -1130,19 +1136,26 @@
 				++tcpstat.tcps_predack;
 				/*
 				 * "bad retransmit" recovery
+				 *
+				 * If Eifel detection applies, then
+				 * it is deterministic, so use it
+				 * unconditionally over the old heuristic
+				 * Otherwise, fall back to the old heuristic.
 				 */
-				if (tp->t_rxtshift == 1 &&
+				if (tcp_do_eifel_detect &&
+				    (to.to_flags & TOF_TS) && to.to_tsecr &&
+				    (tp->t_flags & TF_FIRSTACCACK)) {
+					/* Eifel detection applicable. */
+					if (to.to_tsecr < tp->t_rexmtTS) {
+						tcp_revert_congestion_state(tp);
+						++tcpstat.tcps_eifeldetected;
+					}
+				} else if (tp->t_rxtshift == 1 &&
 				    ticks < tp->t_badrxtwin) {
-					++tcpstat.tcps_sndrexmitbad;
-					tp->snd_cwnd = tp->snd_cwnd_prev;
-					tp->snd_ssthresh =
-					    tp->snd_ssthresh_prev;
-					tp->snd_recover = tp->snd_recover_prev;
-					if (tp->t_flags & TF_WASFRECOVERY)
-					    ENTER_FASTRECOVERY(tp);
-					tp->snd_nxt = tp->snd_max;
-					tp->t_badrxtwin = 0;
+					tcp_revert_congestion_state(tp);
+					++tcpstat.tcps_rttdetected;
 				}
+				tp->t_flags &= ~(TF_FIRSTACCACK | TF_FASTREXMT);
 
 				/*
 				 * Recalculate the transmit timer / rtt.
@@ -1911,6 +1924,11 @@
 						tp->t_dupacks = 0;
 						break;
 					}
+					if (tcp_do_eifel_detect &&
+					    (tp->t_flags & TF_RCVD_TSTMP)) {
+						tcp_save_congestion_state(tp);
+						tp->t_flags |= TF_FASTREXMT;
+					}
 					win = min(tp->snd_wnd, tp->snd_cwnd) /
 					    2 / tp->t_maxseg;
 					if (win < 2)
@@ -2037,15 +2055,17 @@
 		 * original cwnd and ssthresh, and proceed to transmit where
 		 * we left off.
 		 */
-		if (tp->t_rxtshift == 1 && ticks < tp->t_badrxtwin) {
-			++tcpstat.tcps_sndrexmitbad;
-			tp->snd_cwnd = tp->snd_cwnd_prev;
-			tp->snd_ssthresh = tp->snd_ssthresh_prev;
-			tp->snd_recover = tp->snd_recover_prev;
-			if (tp->t_flags & TF_WASFRECOVERY)
-				ENTER_FASTRECOVERY(tp);
-			tp->snd_nxt = tp->snd_max;
-			tp->t_badrxtwin = 0;	/* XXX probably not required */ 
+		if (tcp_do_eifel_detect && acked &&
+		    (to.to_flags & TOF_TS) && to.to_tsecr &&
+		    (tp->t_flags & TF_FIRSTACCACK)) {
+			/* Eifel detection applicable. */
+			if (to.to_tsecr < tp->t_rexmtTS) {
+				tcp_revert_congestion_state(tp);
+				++tcpstat.tcps_eifeldetected;
+			}
+		} else if (tp->t_rxtshift == 1 && ticks < tp->t_badrxtwin) {
+			tcp_revert_congestion_state(tp);
+			++tcpstat.tcps_rttdetected;
 		}
 
 		/*
@@ -2090,6 +2110,9 @@
 		if (acked == 0)
 			goto step6;
 
+		/* Stop looking for an acceptable ACK since one was received. */
+		tp->t_flags &= ~(TF_FIRSTACCACK | TF_FASTREXMT);
+
 		/*
 		 * When new data is acked, open the congestion window.
 		 * If the window gives us less than ssthresh packets
Index: src/sys/netinet/tcp_timer.c
diff -u src/sys/netinet/tcp_timer.c:1.64 src/sys/netinet/tcp_timer.c:1.64.1000.1
--- src/sys/netinet/tcp_timer.c:1.64	Thu Apr  8 04:46:14 2004
+++ src/sys/netinet/tcp_timer.c	Sat Jun 19 17:21:09 2004
@@ -467,6 +467,39 @@
 }
 
 void
+tcp_save_congestion_state(struct tcpcb *tp)
+{
+	tp->snd_cwnd_prev = tp->snd_cwnd;
+	tp->snd_ssthresh_prev = tp->snd_ssthresh;
+	tp->snd_recover_prev = tp->snd_recover;
+	if (IN_FASTRECOVERY(tp))
+	    tp->t_flags |= TF_WASFRECOVERY;
+	else
+	    tp->t_flags &= ~TF_WASFRECOVERY;
+	if (tp->t_flags & TF_RCVD_TSTMP) {
+		tp->t_rexmtTS = ticks;
+		tp->t_flags |= TF_FIRSTACCACK;
+	}
+}
+
+void
+tcp_revert_congestion_state(struct tcpcb *tp)
+{
+	tp->snd_cwnd = tp->snd_cwnd_prev;
+	tp->snd_ssthresh = tp->snd_ssthresh_prev;
+	tp->snd_recover = tp->snd_recover_prev;
+	if (tp->t_flags & TF_WASFRECOVERY)
+	    ENTER_FASTRECOVERY(tp);
+	if (tp->t_flags & TF_FASTREXMT)
+	    ++tcpstat.tcps_sndfastrexmitbad;
+	else
+	    ++tcpstat.tcps_sndrtobad;
+	tp->t_badrxtwin = 0;
+	tp->t_rxtshift = 0;
+	tp->snd_nxt = tp->snd_max;
+}
+
+void
 tcp_timer_rexmt(xtp)
 	void *xtp;
 {
@@ -521,14 +554,9 @@
 		 * "On Estimating End-to-End Network Path Properties" by
 		 * Allman and Paxson for more details.
 		 */
-		tp->snd_cwnd_prev = tp->snd_cwnd;
-		tp->snd_ssthresh_prev = tp->snd_ssthresh;
-		tp->snd_recover_prev = tp->snd_recover;
-		if (IN_FASTRECOVERY(tp))
-		  tp->t_flags |= TF_WASFRECOVERY;
-		else
-		  tp->t_flags &= ~TF_WASFRECOVERY;
 		tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));
+		tcp_save_congestion_state(tp);
+		tp->t_flags &= ~TF_FASTREXMT;
 	}
 	tcpstat.tcps_rexmttimeo++;
 	if (tp->t_state == TCPS_SYN_SENT)
Index: src/sys/netinet/tcp_var.h
diff -u src/sys/netinet/tcp_var.h:1.105 src/sys/netinet/tcp_var.h:1.105.1000.3
--- src/sys/netinet/tcp_var.h:1.105	Mon Apr 26 10:56:31 2004
+++ src/sys/netinet/tcp_var.h	Sat Jun 19 18:35:10 2004
@@ -78,29 +78,32 @@
 	struct	inpcb *t_inpcb;		/* back pointer to internet pcb */
 	int	t_state;		/* state of this connection */
 	u_int	t_flags;
-#define	TF_ACKNOW	0x000001	/* ack peer immediately */
-#define	TF_DELACK	0x000002	/* ack, but try to delay it */
-#define	TF_NODELAY	0x000004	/* don't delay packets to coalesce */
-#define	TF_NOOPT	0x000008	/* don't use tcp options */
-#define	TF_SENTFIN	0x000010	/* have sent FIN */
-#define	TF_REQ_SCALE	0x000020	/* have/will request window scaling */
-#define	TF_RCVD_SCALE	0x000040	/* other side has requested scaling */
-#define	TF_REQ_TSTMP	0x000080	/* have/will request timestamps */
-#define	TF_RCVD_TSTMP	0x000100	/* a timestamp was received in SYN */
-#define	TF_SACK_PERMIT	0x000200	/* other side said I could SACK */
-#define	TF_NEEDSYN	0x000400	/* send SYN (implicit state) */
-#define	TF_NEEDFIN	0x000800	/* send FIN (implicit state) */
-#define	TF_NOPUSH	0x001000	/* don't push */
-#define	TF_REQ_CC	0x002000	/* have/will request CC */
-#define	TF_RCVD_CC	0x004000	/* a CC was received in SYN */
-#define	TF_SENDCCNEW	0x008000	/* send CCnew instead of CC in SYN */
-#define	TF_MORETOCOME	0x010000	/* More data to be appended to sock */
-#define	TF_LQ_OVERFLOW	0x020000	/* listen queue overflow */
-#define	TF_LASTIDLE	0x040000	/* connection was previously idle */
-#define	TF_RXWIN0SENT	0x080000	/* sent a receiver win 0 in response */
-#define	TF_FASTRECOVERY	0x100000	/* in NewReno Fast Recovery */
-#define	TF_WASFRECOVERY	0x200000	/* was in NewReno Fast Recovery */
-#define	TF_SIGNATURE	0x400000	/* require MD5 digests (RFC2385) */
+#define	TF_ACKNOW	0x00000001	/* ack peer immediately */
+#define	TF_DELACK	0x00000002	/* ack, but try to delay it */
+#define	TF_NODELAY	0x00000004	/* don't delay packets to coalesce */
+#define	TF_NOOPT	0x00000008	/* don't use tcp options */
+#define	TF_SENTFIN	0x00000010	/* have sent FIN */
+#define	TF_REQ_SCALE	0x00000020	/* have/will request window scaling */
+#define	TF_RCVD_SCALE	0x00000040	/* other side has requested scaling */
+#define	TF_REQ_TSTMP	0x00000080	/* have/will request timestamps */
+#define	TF_RCVD_TSTMP	0x00000100	/* a timestamp was received in SYN */
+#define	TF_SACK_PERMIT	0x00000200	/* other side said I could SACK */
+#define	TF_NEEDSYN	0x00000400	/* send SYN (implicit state) */
+#define	TF_NEEDFIN	0x00000800	/* send FIN (implicit state) */
+#define	TF_NOPUSH	0x00001000	/* don't push */
+#define	TF_REQ_CC	0x00002000	/* have/will request CC */
+#define	TF_RCVD_CC	0x00004000	/* a CC was received in SYN */
+#define	TF_SENDCCNEW	0x00008000	/* send CCnew instead of CC in SYN */
+#define	TF_MORETOCOME	0x00010000	/* More data to be appended to sock */
+#define	TF_LQ_OVERFLOW	0x00020000	/* listen queue overflow */
+#define	TF_LASTIDLE	0x00040000	/* connection was previously idle */
+#define	TF_RXWIN0SENT	0x00080000	/* sent a receiver win 0 in response */
+#define	TF_FASTRECOVERY	0x00100000	/* in NewReno Fast Recovery */
+#define	TF_WASFRECOVERY	0x00200000	/* was in NewReno Fast Recovery */
+#define	TF_SIGNATURE	0x00400000	/* require MD5 digests (RFC2385) */
+#define	TF_FIRSTACCACK	0x00800000	/* Look for 1st acceptable ACK. */
+#define	TF_FASTREXMT	0x01000000	/* Did Fast Retransmit. */
+
 	int	t_force;		/* 1 if forcing out a byte */
 
 	tcp_seq	snd_una;		/* send unacknowledged */
@@ -174,6 +177,7 @@
 	u_long	snd_ssthresh_prev;	/* ssthresh prior to retransmit */
 	tcp_seq	snd_recover_prev;	/* snd_recover prior to retransmit */
 	u_long	t_badrxtwin;		/* window for retransmit recovery */
+	u_long	t_rexmtTS;		/* timestamp of last retransmit */
 	u_char	snd_limited;		/* segments limited transmitted */
 /* anti DoS counters */
 	u_long	rcv_second;		/* start of interval second */
@@ -371,7 +375,10 @@
 	u_long	tcps_sndbyte;		/* data bytes sent */
 	u_long	tcps_sndrexmitpack;	/* data packets retransmitted */
 	u_long	tcps_sndrexmitbyte;	/* data bytes retransmitted */
-	u_long	tcps_sndrexmitbad;	/* unnecessary packet retransmissions */
+	u_long	tcps_sndrtobad;		/* spurious RTO retransmissions */
+	u_long	tcps_sndfastrexmitbad;	/* spurious Fast Retransmissions */
+	u_long	tcps_eifeldetected;	/* Eifel-detected spurious rexmits */
+	u_long	tcps_rttdetected;	/* RTT-detected spurious RTO rexmits */
 	u_long	tcps_sndacks;		/* ack-only packets sent */
 	u_long	tcps_sndprobe;		/* window probes sent */
 	u_long	tcps_sndurg;		/* packets sent with URG only */
@@ -538,6 +545,8 @@
 void	 tcp_respond(struct tcpcb *, void *,
 	    struct tcphdr *, struct mbuf *, tcp_seq, tcp_seq, int);
 int	 tcp_twrespond(struct tcptw *, int);
+void	 tcp_save_congestion_state(struct tcpcb *tp);
+void	 tcp_revert_congestion_state(struct tcpcb *tp);
 void	 tcp_setpersist(struct tcpcb *);
 #ifdef TCP_SIGNATURE
 int	 tcp_signature_compute(struct mbuf *, int, int, int, u_char *, u_int);
Index: src/usr.bin/netstat/inet.c
diff -u src/usr.bin/netstat/inet.c:1.65 src/usr.bin/netstat/inet.c:1.65.1000.1
--- src/usr.bin/netstat/inet.c:1.65	Wed Jun 16 15:00:50 2004
+++ src/usr.bin/netstat/inet.c	Sat Jun 19 18:33:40 2004
@@ -382,8 +382,10 @@
 		"\t\t%lu data packet%s (%lu byte%s)\n");
 	p2(tcps_sndrexmitpack, tcps_sndrexmitbyte,
 		"\t\t%lu data packet%s (%lu byte%s) retransmitted\n");
-	p(tcps_sndrexmitbad,
-		"\t\t%lu data packet%s unnecessarily retransmitted\n");
+	p(tcps_sndrtobad, "\t\t%lu spurious RTO retransmit%s\n");
+	p(tcps_sndfastrexmitbad, "\t\t%lu spurious Fast Retransmit%s\n");
+	p(tcps_eifeldetected, "\t\t%lu Eifel-detected spurious retransmit%s\n");
+	p(tcps_rttdetected, "\t\t%lu RTT-detected spurious retransmit%s\n");
 	p(tcps_mturesent, "\t\t%lu resend%s initiated by MTU discovery\n");
 	p2a(tcps_sndacks, tcps_delack,
 		"\t\t%lu ack-only packet%s (%lu delayed)\n");
--- rfc3522.diff ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->bms 
Responsible-Changed-By: bms 
Responsible-Changed-When: Wed Jun 23 06:59:12 GMT 2004 
Responsible-Changed-Why:  
I'll try to look at this 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 
Responsible-Changed-From-To: bms->hsu 
Responsible-Changed-By: hsu 
Responsible-Changed-When: Fri Jun 25 09:12:43 GMT 2004 
Responsible-Changed-Why:  
This is my work. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 
Responsible-Changed-From-To: hsu->freebsd-ports 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Oct 25 03:51:24 GMT 2005 
Responsible-Changed-Why:  
With bugmeister hat on, reset assignment due to committer inactivity. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 
Responsible-Changed-From-To: freebsd-ports->hsu 
Responsible-Changed-By: hsu 
Responsible-Changed-When: Tue Oct 25 06:53:35 GMT 2005 
Responsible-Changed-Why:  
Original developer actively looking for ways to fund his BSD TCP and IP work. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 

From: Mario Sergio Fujikawa Ferreira <lioux@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc: Xin LI <delphij@FreeBSD.org.cn>, bms@FreeBSD.org
Subject: Re: kern/68110 : [netinet] [patch] RFC 3522 for -HEAD
Date: Sat, 21 Jan 2006 12:33:54 -0200

 	Are there any new developments? I mean, is this been
 studied for FreeBSD -CURRENT? Are there plans for testing against
 6-STABLE?
 
 	Anything I can do to help? As with RFC 3522, do you know
 if there is any work being done on RFC 2988? DSACK? Early retransmit?
 TCP auto tuning such as NetBSD?
 
 	Regards,
 
 -- 
 Mario S F Ferreira - DF - Brazil - "I guess this is a signature."
 feature, n: a documented bug | bug, n: an undocumented feature
State-Changed-From-To: open->suspended 
State-Changed-By: linimon 
State-Changed-When: Mon Apr 3 21:24:24 UTC 2006 
State-Changed-Why:  
Mark as suspended since nothing has happened on this for ~6 months. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 
State-Changed-From-To: suspended->closed 
State-Changed-By: delphij 
State-Changed-When: Thu Apr 12 09:04:00 UTC 2007 
State-Changed-Why:  
Mark as closed: silby@ has pointed out that RFC3522 is covered 
by a Ericsson patent, which seems to be unfriendly for those who 
wants to use our code in a commercial product: 

http://www.ietf.org/ietf/IPR/ERICSSON-EIFEL 

Given that the current state of the IP state of the Eifel detection 
is not suitable for inclusinon in a freely-redistributable operating 
system, we have no other choice but to close this PR, and hope that 
the IP holder would consider make a more friendly license policy in 
the future, so we will be able to include it into FreeBSD. 

The final version of the patchset can be obtained from: 

http://research.delphij.net/freebsd/rfc3522.diff 

Thanks to everyone who has worked on this. 


Responsible-Changed-From-To: hsu->delphij 
Responsible-Changed-By: delphij 
Responsible-Changed-When: Thu Apr 12 09:04:00 UTC 2007 
Responsible-Changed-Why:  
I am responsible for killing this PR. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68110 
>Unformatted:

This experimental patch leaves out the 4 clause copyright that it
is held under and is not urgent in the least.
