From mjl@luckie.org.nz  Sun May 16 09:52:48 2010
Return-Path: <mjl@luckie.org.nz>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5FF141065672
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 16 May 2010 09:52:48 +0000 (UTC)
	(envelope-from mjl@luckie.org.nz)
Received: from mailfilter68.ihug.co.nz (mailfilter68.ihug.co.nz [203.109.136.68])
	by mx1.freebsd.org (Postfix) with ESMTP id EDD7B8FC1F
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 16 May 2010 09:52:47 +0000 (UTC)
Received: from 118-93-81-147.dsl.dyn.ihug.co.nz (HELO spandex.luckie.org.nz) ([118.93.81.147])
  by cust.filter4.content.vf.net.nz with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 May 2010 21:52:44 +1200
Received: from mylar.luckie.org.nz ([192.168.1.24])
	by spandex.luckie.org.nz with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.71 (FreeBSD))
	(envelope-from <mjl@luckie.org.nz>)
	id 1ODaWZ-000Awh-NZ
	for FreeBSD-gnats-submit@freebsd.org; Sun, 16 May 2010 21:52:43 +1200
Received: from mjl by mylar.luckie.org.nz with local (Exim 4.71 (FreeBSD))
	(envelope-from <mjl@mylar.luckie.org.nz>)
	id 1ODaWh-0000Sh-Ql
	for FreeBSD-gnats-submit@freebsd.org; Sun, 16 May 2010 21:52:51 +1200
Message-Id: <E1ODaWh-0000Sh-Ql@mylar.luckie.org.nz>
Date: Sun, 16 May 2010 21:52:51 +1200
From: Matthew Luckie <mjl@luckie.org.nz>
Reply-To: Matthew Luckie <mjl@luckie.org.nz>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [patch] TCP does not clear DF when MTU is below a threshold
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         146628
>Category:       kern
>Synopsis:       [tcp] [patch] TCP does not clear DF when MTU is below a threshold
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    andre
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 16 10:00:14 UTC 2010
>Closed-Date:    Mon Aug 23 14:25:04 UTC 2010
>Last-Modified:  Mon Aug 23 14:25:04 UTC 2010
>Originator:     Matthew Luckie
>Release:        FreeBSD 8.0-STABLE i386
>Organization:
>Environment:
System: FreeBSD mylar.luckie.org.nz 8.0-STABLE FreeBSD 8.0-STABLE #3: Sun May 16 21:31:15 NZST 2010 root@mylar.luckie.org.nz:/usr/src/sys/i386/compile/mylar i386

>Description:

FreeBSD, like most operating systems, will refuse to lower TCP's
segment size in response to an ICMP needfrag below a threshold.  In
FreeBSD's case, this is 512 bytes.  If a needfrag next-hop MTU 256 is
received, FreeBSD will reduce the connection's segment size to 512
bytes, and will then resend the presumed missing packet, but without
first clearing the DF bit.  If the Path MTU is in fact less than 512
bytes FreeBSD will get another needfrag, which it will ignore.  The
patch below will cause subsequent segments to be sent without the DF
bit set, and does not change FreeBSD's default behaviour of refusing
to reduce its segment size below a defined threshold.

>How-To-Repeat:

install net/scamper

scamper -F ipfw -I "tbit -M 256 -u '<url on webserver>' -i <ip address>"

>Fix:

--- patch-pmtud begins here ---
--- tcp_var.h.orig	2009-08-03 20:13:06.000000000 +1200
+++ tcp_var.h	2010-05-14 21:03:42.000000000 +1200
@@ -234,6 +234,7 @@
 #define	TF_ECN_PERMIT	0x4000000	/* connection ECN-ready */
 #define	TF_ECN_SND_CWR	0x8000000	/* ECN CWR in queue */
 #define	TF_ECN_SND_ECE	0x10000000	/* ECN ECE in queue */
+#define TF_IPDF		0x20000000	/* set the DF bit */
 
 #define IN_FASTRECOVERY(tp)	(tp->t_flags & TF_FASTRECOVERY)
 #define ENTER_FASTRECOVERY(tp)	tp->t_flags |= TF_FASTRECOVERY
--- tcp_subr.c.orig	2009-08-03 20:13:06.000000000 +1200
+++ tcp_subr.c	2010-05-16 21:26:50.000000000 +1200
@@ -656,7 +656,9 @@
 		tlen += sizeof (struct tcpiphdr);
 		ip->ip_len = tlen;
 		ip->ip_ttl = V_ip_defttl;
-		if (V_path_mtu_discovery)
+		if (tp != NULL && tp->t_flags & TF_IPDF)
+			ip->ip_off |= IP_DF;
+		else if (tp == NULL && V_path_mtu_discovery)
 			ip->ip_off |= IP_DF;
 	}
 	m->m_len = tlen;
@@ -757,6 +759,9 @@
 		tp->t_flags = (TF_REQ_SCALE|TF_REQ_TSTMP);
 	if (V_tcp_do_sack)
 		tp->t_flags |= TF_SACK_PERMIT;
+	if (V_path_mtu_discovery)
+		tp->t_flags |= TF_IPDF;
+
 	TAILQ_INIT(&tp->snd_holes);
 	tp->t_inpcb = inp;	/* XXX */
 	/*
@@ -1361,9 +1366,11 @@
 					    if (mtu < max(296, V_tcp_minmss
 						 + sizeof(struct tcpiphdr)))
 						mtu = 0;
-					    if (!mtu)
+					    if (!mtu) {
 						mtu = V_tcp_mssdflt
 						 + sizeof(struct tcpiphdr);
+						tp->t_flags &= ~TF_IPDF;
+					    }
 					    /*
 					     * Only cache the the MTU if it
 					     * is smaller than the interface
--- tcp_syncache.c.orig	2010-05-16 21:30:21.000000000 +1200
+++ tcp_syncache.c	2010-05-16 21:31:00.000000000 +1200
@@ -779,6 +779,9 @@
 	if (sc->sc_flags & SCF_ECN)
 		tp->t_flags |= TF_ECN_PERMIT;
 
+	if (V_path_mtu_discovery)
+		tp->t_flags |= TF_IPDF;
+
 	/*
 	 * Set up MSS and get cached values from tcp_hostcache.
 	 * This might overwrite some of the defaults we just set.
--- tcp_output.c.orig	2009-11-18 05:17:11.000000000 +1300
+++ tcp_output.c	2010-05-16 20:38:25.000000000 +1200
@@ -1181,7 +1181,7 @@
 	 * Section 2. However the tcp hostcache migitates the problem
 	 * so it affects only the first tcp connection with a host.
 	 */
-	if (V_path_mtu_discovery)
+	if (tp->t_flags & TF_IPDF)
 		ip->ip_off |= IP_DF;
 
 	error = ip_output(m, tp->t_inpcb->inp_options, NULL,
--- patch-pmtud ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun May 16 19:04:46 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146628 
Responsible-Changed-From-To: freebsd-net->andre 
Responsible-Changed-By: andre 
Responsible-Changed-When: Tue Aug 10 22:08:02 UTC 2010 
Responsible-Changed-Why:  
Take over. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146628 

From: Andre Oppermann <oppermann@networx.ch>
To: mjl@luckie.org.nz
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Thu, 12 Aug 2010 15:16:05 +0200

 Matthew,
 
 Thank you for your bug report and the proposed patch.  I have analysed
 the problem further and there is indeed a bug in the way the suggested
 MTU from the ICMP fragmentation needed message is handled.
 
 The attached, different, patch tries to fix the issue in two ways:
   - the icmp message suggested value is no longer ignored if it goes below
     the configured minmss value.  Instead minmss plus tcpiphdr is used as
     floor and for the new MTU.  The use of the default mss is removed.
   - the DF flag is turned not set whenever the mss is equal or lower to
     minmss.  This allows even smaller MTU's to be used with fragmentation.
 
 Please test again with this patch applied.
 
 -- 
 Andre
 
 Index: tcp_subr.c
 ===================================================================
 --- tcp_subr.c  (revision 211211)
 +++ tcp_subr.c  (working copy)
 @@ -1339,11 +1325,9 @@
                                              if (!mtu)
                                                  mtu = ip_next_mtu(ip->ip_len,
                                                   1);
 -                                           if (mtu < max(296, V_tcp_minmss
 -                                                + sizeof(struct tcpiphdr)))
 -                                               mtu = 0;
 -                                           if (!mtu)
 -                                               mtu = V_tcp_mssdflt
 +                                           if (mtu < V_tcp_minmss
 +                                                + sizeof(struct tcpiphdr))
 +                                               mtu = V_tcp_minmss
                                                   + sizeof(struct tcpiphdr);
                                              /*
                                               * Only cache the the MTU if it
 Index: tcp_output.c
 ===================================================================
 --- tcp_output.c        (revision 211211)
 +++ tcp_output.c        (working copy)
 @@ -1183,8 +1221,10 @@
           * This might not be the best thing to do according to RFC3390
           * Section 2. However the tcp hostcache migitates the problem
           * so it affects only the first tcp connection with a host.
 +        *
 +        * NB: Don't set DF on small MTU/MSS to have a safe fallback.
           */
 -       if (V_path_mtu_discovery)
 +       if (V_path_mtu_discovery && tp->t_maxopd >= V_tcp_minmss)
                  ip->ip_off |= IP_DF;
 
          error = ip_output(m, tp->t_inpcb->inp_options, NULL,

From: Matthew Luckie <mjl@luckie.org.nz>
To: Andre Oppermann <oppermann@networx.ch>
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Sat, 14 Aug 2010 11:45:02 +1200

 I tried your patch, but on my FreeBSD 8.1 system (I don't have a 
 -current system).  It will reduce its packet size to 256 now, but does 
 not clear the DF bit, and if I try something smaller than 256 it does 
 not reduce and does not clear the DF bit.  Sorry if the following detail 
 is not useful because I used 8.1.
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 1460, result: pmtud-success
   [  0.112] TX SYN       44  seq = 0:0             ac69
   [  0.112] RX SYN/ACK   44  seq = 0:1             000c
   [  0.167] TX           40  seq = 1:1             ac6a
   [  0.205] TX          225  seq = 1:1(185)        ac6b DF
   [  0.257] RX          372  seq = 1:186(332)      000d DF
   [  0.257] TX           40  seq = 186:333         ac6c
   [  0.257] RX         1500  seq = 333:186(1460)   000e DF
   [  0.299] TX PTB       56  mtu = 576
   [  0.299] RX          576  seq = 333:186(536)    000f DF
   [  0.299] RX          576  seq = 869:186(536)    0010 DF
   [  0.299] RX          576  seq = 1405:186(536)   0011 DF
   [  0.355] TX           40  seq = 186:869         ac6e
   [  0.356] RX          576  seq = 1941:186(536)   0012 DF
   [  0.356] RX          576  seq = 2477:186(536)   0013 DF
   [  0.399] TX FIN       40  seq = 186:1405        ac6f
   [  0.399] RX          576  seq = 3013:187(536)   0014 DF
   [  0.399] RX          576  seq = 3549:187(536)   0015 DF
   [  0.454] TX FIN       40  seq = 186:1941        ac70
   [  0.454] RX          576  seq = 4085:187(536)   0016 DF
   [  0.454] RX FIN       45  seq = 4621:187(5)     0017 DF
   [  0.498] TX FIN       40  seq = 186:2477        ac71
   [  0.498] RX FIN       40  seq = 4626:187        0018
   [  0.558] TX FIN       40  seq = 186:3013        ac72
   [  0.558] RX FIN       40  seq = 4626:187        0019
   [  0.610] TX           40  seq = 187:3549        ac73
   [  0.654] TX           40  seq = 187:4085        ac74
   [  0.706] TX           40  seq = 187:4621        ac75
   [  0.761] TX           40  seq = 187:4627        ac76
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 536, result: pmtud-success
   [  0.063] TX SYN       44  seq = 0:0             e5da
   [  0.063] RX SYN/ACK   44  seq = 0:1             001a
   [  0.106] TX           40  seq = 1:1             e5db
   [  0.149] TX          225  seq = 1:1(185)        e5dc DF
   [  0.150] RX          576  seq = 1:186(536)      001b DF
   [  0.206] TX PTB       56  mtu = 400
   [  0.206] RX          400  seq = 1:186(360)      001c DF
   [  0.206] RX          217  seq = 361:186(177)    001d DF
   [  0.250] TX           40  seq = 186:361         e5de
   [  0.250] RX          400  seq = 538:186(360)    001e DF
   [  0.250] RX          400  seq = 898:186(360)    001f DF
   [  0.300] TX FIN       40  seq = 186:538         e5df
   [  0.301] RX          394  seq = 1258:187(354)   0020 DF
   [  0.351] TX FIN       40  seq = 186:898         e5e0
   [  0.351] RX          400  seq = 1612:187(360)   0021 DF
   [  0.351] RX          400  seq = 1972:187(360)   0022 DF
   [  0.410] TX FIN       40  seq = 186:1258        e5e1
   [  0.411] RX          400  seq = 2332:187(360)   0023 DF
   [  0.411] RX          400  seq = 2692:187(360)   0024 DF
   [  0.450] TX           40  seq = 187:1612        e5e2
   [  0.451] RX          400  seq = 3052:187(360)   0025 DF
   [  0.506] TX           40  seq = 187:1972        e5e3
   [  0.506] RX          400  seq = 3412:187(360)   0026 DF
   [  0.506] RX          400  seq = 3772:187(360)   0027 DF
   [  0.568] TX           40  seq = 187:2332        e5e4
   [  0.568] RX          400  seq = 4132:187(360)   0028 DF
   [  0.568] RX FIN      174  seq = 4492:187(134)   0029 DF
   [  0.599] TX           40  seq = 187:2692        e5e5
   [  0.650] TX           40  seq = 187:3052        e5e6
   [  0.701] TX           40  seq = 187:3412        e5e7
   [  0.757] TX           40  seq = 187:3772        e5e8
   [  0.805] TX           40  seq = 187:4132        e5e9
   [  0.853] TX           40  seq = 187:4492        e5ea
   [  0.906] TX           40  seq = 187:4627        e5eb
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 360, result: pmtud-success
   [  0.044] TX SYN       44  seq = 0:0             be68
   [  0.044] RX SYN/ACK   44  seq = 0:1             002a
   [  0.093] TX           40  seq = 1:1             be69
   [  0.144] TX          225  seq = 1:1(185)        be6a DF
   [  0.145] RX          400  seq = 1:186(360)      002b DF
   [  0.194] TX PTB       56  mtu = 256
   [  0.195] RX          256  seq = 1:186(216)      002c DF
   [  0.195] RX          185  seq = 217:186(145)    002d DF
   [  0.244] TX           40  seq = 186:217         be6c
   [  0.244] RX          256  seq = 362:186(216)    002e DF
   [  0.244] RX          256  seq = 578:186(216)    002f DF
   [  0.296] TX FIN       40  seq = 186:362         be6d
   [  0.296] RX          256  seq = 794:187(216)    0030 DF
   [  0.345] TX FIN       40  seq = 186:578         be6e
   [  0.346] RX          256  seq = 1010:187(216)   0031 DF
   [  0.346] RX          256  seq = 1226:187(216)   0032 DF
   [  0.400] TX FIN       40  seq = 186:794         be6f
   [  0.401] RX          256  seq = 1442:187(216)   0033 DF
   [  0.401] RX          256  seq = 1658:187(216)   0034 DF
   [  0.455] TX           40  seq = 187:1010        be70
   [  0.455] RX          256  seq = 1874:187(216)   0035 DF
   [  0.455] RX          256  seq = 2090:187(216)   0036 DF
   [  0.501] TX           40  seq = 187:1226        be71
   [  0.501] RX          256  seq = 2306:187(216)   0037 DF
   [  0.501] RX          256  seq = 2522:187(216)   0038 DF
   [  0.549] TX           40  seq = 187:1442        be72
   [  0.549] RX          256  seq = 2738:187(216)   0039 DF
   [  0.550] RX          256  seq = 2954:187(216)   003a DF
   [  0.604] TX           40  seq = 187:1658        be73
   [  0.604] RX          256  seq = 3170:187(216)   003b DF
   [  0.604] RX          256  seq = 3386:187(216)   003c DF
   [  0.645] TX           40  seq = 187:1874        be74
   [  0.645] RX          256  seq = 3602:187(216)   003d DF
   [  0.645] RX          256  seq = 3818:187(216)   003e DF
   [  0.698] TX           40  seq = 187:2090        be75
   [  0.698] RX          256  seq = 4034:187(216)   003f DF
   [  0.698] RX          256  seq = 4250:187(216)   0040 DF
   [  0.749] TX           40  seq = 187:2306        be76
   [  0.749] RX FIN      200  seq = 4466:187(160)   0041 DF
   [  0.804] TX           40  seq = 187:2522        be77
   [  0.855] TX           40  seq = 187:2738        be78
   [  0.895] TX           40  seq = 187:2954        be79
   [  0.950] TX           40  seq = 187:3170        be7a
   [  0.994] TX           40  seq = 187:3386        be7b
   [  1.046] TX           40  seq = 187:3602        be7c
   [  1.093] TX           40  seq = 187:3818        be7d
   [  1.145] TX           40  seq = 187:4034        be7e
   [  1.195] TX           40  seq = 187:4250        be7f
   [  1.255] TX           40  seq = 187:4466        be80
   [  1.293] TX           40  seq = 187:4627        be81
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 216, result: pmtud-fail
   [  0.043] TX SYN       44  seq = 0:0             53ef
   [  0.043] RX SYN/ACK   44  seq = 0:1             0042
   [  0.090] TX           40  seq = 1:1             53f0
   [  0.154] TX          225  seq = 1:1(185)        53f1 DF
   [  0.154] RX          256  seq = 1:186(216)      0043 DF
   [  0.189] TX PTB       56  mtu = 255
   [  0.189] RX          256  seq = 1:186(216)      0044 DF
   [  0.243] TX PTB       56  mtu = 255
   [  0.244] RX          256  seq = 1:186(216)      0045 DF
   [  0.287] TX PTB       56  mtu = 255
   [  0.288] RX          256  seq = 1:186(216)      0046 DF
   [  0.349] TX PTB       56  mtu = 255
   [  0.350] RX          256  seq = 1:186(216)      0047 DF

From: Andre Oppermann <oppermann@networx.ch>
To: Matthew Luckie <mjl@luckie.org.nz>
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Sun, 15 Aug 2010 10:11:53 +0200

 This is a multi-part message in MIME format.
 --------------090601050107040406050704
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 On 14.08.2010 01:45, Matthew Luckie wrote:
 > I tried your patch, but on my FreeBSD 8.1 system (I don't have a
 > -current system). It will reduce its packet size to 256 now, but does
 > not clear the DF bit, and if I try something smaller than 256 it does
 > not reduce and does not clear the DF bit. Sorry if the following detail
 > is not useful because I used 8.1.
 
 Thanks for testing.  Using 8.1 is fine as there is no difference in
 this part between -CURRENT and 8.1.
 
 Please try the attached patch with a small difference.  It will clear
 the DF bit in any case when the minmss floor is reached.
 
 -- 
 Andre
 
 --------------090601050107040406050704
 Content-Type: text/plain;
  name="patch-3.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="patch-3.diff"
 
 Index: tcp_subr.c
 ===================================================================
 --- tcp_subr.c  (revision 211211)
 +++ tcp_subr.c  (working copy)
 @@ -1339,11 +1325,9 @@
                                              if (!mtu)
                                                  mtu = ip_next_mtu(ip->ip_len,
                                                   1);
 -                                           if (mtu < max(296, V_tcp_minmss
 -                                                + sizeof(struct tcpiphdr)))
 -                                               mtu = 0;
 -                                           if (!mtu)
 -                                               mtu = V_tcp_mssdflt
 +                                           if (mtu < V_tcp_minmss
 +                                                + sizeof(struct tcpiphdr))
 +                                               mtu = V_tcp_minmss
                                                   + sizeof(struct tcpiphdr);
                                              /*
                                               * Only cache the the MTU if it
 Index: tcp_output.c
 ===================================================================
 --- tcp_output.c        (revision 211211)
 +++ tcp_output.c        (working copy)
 @@ -1183,8 +1221,10 @@
           * This might not be the best thing to do according to RFC3390
           * Section 2. However the tcp hostcache migitates the problem
           * so it affects only the first tcp connection with a host.
 +        *
 +        * NB: Don't set DF on small MTU/MSS to have a safe fallback.
           */
 -       if (V_path_mtu_discovery)
 +       if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
                  ip->ip_off |= IP_DF;
          error = ip_output(m, tp->t_inpcb->inp_options, NULL,
 
 --------------090601050107040406050704--

From: Matthew Luckie <mjl@luckie.org.nz>
To: Andre Oppermann <oppermann@networx.ch>
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Sun, 15 Aug 2010 21:03:13 +1200

 On 08/15/10 20:11, Andre Oppermann wrote:
 > On 14.08.2010 01:45, Matthew Luckie wrote:
 >> I tried your patch, but on my FreeBSD 8.1 system (I don't have a
 >> -current system). It will reduce its packet size to 256 now, but does
 >> not clear the DF bit, and if I try something smaller than 256 it does
 >> not reduce and does not clear the DF bit. Sorry if the following detail
 >> is not useful because I used 8.1.
 >
 > Thanks for testing. Using 8.1 is fine as there is no difference in
 > this part between -CURRENT and 8.1.
 >
 > Please try the attached patch with a small difference. It will clear
 > the DF bit in any case when the minmss floor is reached.
 
 That patch works for me.  It reduces to 256, leaving the DF bit set for 
 MTU 256, but does not reduce further and will clear DF below 256.
 
 Odd, the only difference between the two patches is
 
 if (V_path_mtu_discovery && tp->t_maxopd >= V_tcp_minmss)
 
 became:
 
 if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
 
 Anyway, looks like its fixed now.  I like your patch better than mine.
 
 Matthew

From: Andre Oppermann <oppermann@networx.ch>
To: Matthew Luckie <mjl@luckie.org.nz>
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Sun, 15 Aug 2010 11:34:29 +0200

 On 15.08.2010 11:03, Matthew Luckie wrote:
 > On 08/15/10 20:11, Andre Oppermann wrote:
 >> On 14.08.2010 01:45, Matthew Luckie wrote:
 >>> I tried your patch, but on my FreeBSD 8.1 system (I don't have a
 >>> -current system). It will reduce its packet size to 256 now, but does
 >>> not clear the DF bit, and if I try something smaller than 256 it does
 >>> not reduce and does not clear the DF bit. Sorry if the following detail
 >>> is not useful because I used 8.1.
 >>
 >> Thanks for testing. Using 8.1 is fine as there is no difference in
 >> this part between -CURRENT and 8.1.
 >>
 >> Please try the attached patch with a small difference. It will clear
 >> the DF bit in any case when the minmss floor is reached.
 >
 > That patch works for me. It reduces to 256, leaving the DF bit set for
 > MTU 256, but does not reduce further and will clear DF below 256.
 
 Good.  Could you send the test output of your tool again showing that
 is working fine now?
 
 > Odd, the only difference between the two patches is
 >
 > if (V_path_mtu_discovery && tp->t_maxopd >= V_tcp_minmss)
 >
 > became:
 >
 > if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
 
 It was a logic error I made.  Since t_maxopd could never be lower than
 minmss the >= test was always true and the DF flag was set always.
 
 > Anyway, looks like its fixed now. I like your patch better than mine.
 
 Thanks.  I'm going to commit the fix.
 
 -- 
 Andre
State-Changed-From-To: open->analyzed 
State-Changed-By: andre 
State-Changed-When: Sun Aug 15 10:04:05 UTC 2010 
State-Changed-Why:  
Caused found and fix provided, awaiting final confirmation 
and comitting of patch. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146628 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146628: commit references a PR
Date: Sun, 15 Aug 2010 13:25:30 +0000 (UTC)

 Author: andre
 Date: Sun Aug 15 13:25:18 2010
 New Revision: 211333
 URL: http://svn.freebsd.org/changeset/base/211333
 
 Log:
   Fix the interaction between 'ICMP fragmentation needed' MTU updates,
   path MTU discovery and the tcp_minmss limiter for very small MTU's.
   
   When the MTU suggested by the gateway via ICMP, or if there isn't
   any the next smaller step from ip_next_mtu(), is lower than the
   floor enforced by net.inet.tcp.minmss (default 216) the value is
   ignored and the default MSS (512) is used instead.  However the
   DF flag in the IP header is still set in tcp_output() preventing
   fragmentation by the gateway.
   
   Fix this by using tcp_minmss as the MSS and clear the DF flag if
   the suggested MTU is too low.  This turns off path MTU dissovery
   for the remainder of the session and allows fragmentation to be
   done by the gateway.
   
   Only MTU's smaller than 256 are affected.  The smallest official
   MTU specified is for AX.25 packet radio at 256 octets.
   
   PR:		kern/146628
   Tested by:	Matthew Luckie <mjl-at-luckie org nz>
   MFC after:	1 week
 
 Modified:
   head/sys/netinet/tcp_output.c
   head/sys/netinet/tcp_subr.c
 
 Modified: head/sys/netinet/tcp_output.c
 ==============================================================================
 --- head/sys/netinet/tcp_output.c	Sun Aug 15 13:07:08 2010	(r211332)
 +++ head/sys/netinet/tcp_output.c	Sun Aug 15 13:25:18 2010	(r211333)
 @@ -1186,8 +1186,10 @@ timer:
  	 * This might not be the best thing to do according to RFC3390
  	 * Section 2. However the tcp hostcache migitates the problem
  	 * so it affects only the first tcp connection with a host.
 +	 *
 +	 * NB: Don't set DF on small MTU/MSS to have a safe fallback.
  	 */
 -	if (V_path_mtu_discovery)
 +	if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
  		ip->ip_off |= IP_DF;
  
  	error = ip_output(m, tp->t_inpcb->inp_options, NULL,
 
 Modified: head/sys/netinet/tcp_subr.c
 ==============================================================================
 --- head/sys/netinet/tcp_subr.c	Sun Aug 15 13:07:08 2010	(r211332)
 +++ head/sys/netinet/tcp_subr.c	Sun Aug 15 13:25:18 2010	(r211333)
 @@ -1339,11 +1339,9 @@ tcp_ctlinput(int cmd, struct sockaddr *s
  					    if (!mtu)
  						mtu = ip_next_mtu(ip->ip_len,
  						 1);
 -					    if (mtu < max(296, V_tcp_minmss
 -						 + sizeof(struct tcpiphdr)))
 -						mtu = 0;
 -					    if (!mtu)
 -						mtu = V_tcp_mssdflt
 +					    if (mtu < V_tcp_minmss
 +						 + sizeof(struct tcpiphdr))
 +						mtu = V_tcp_minmss
  						 + sizeof(struct tcpiphdr);
  					    /*
  					     * Only cache the the MTU if it
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->patched 
State-Changed-By: andre 
State-Changed-When: Sun Aug 15 14:04:19 UTC 2010 
State-Changed-Why:  

http://www.freebsd.org/cgi/query-pr.cgi?pr=146628 

From: Matthew Luckie <mjl@luckie.org.nz>
To: Andre Oppermann <oppermann@networx.ch>
Cc: bug-followup@freebsd.org
Subject: Re: kern/146628: [tcp] [patch] TCP does not clear DF when MTU is
 below a threshold
Date: Mon, 16 Aug 2010 09:45:59 +1200

 >>> Please try the attached patch with a small difference. It will clear
 >>> the DF bit in any case when the minmss floor is reached.
 >>
 >> That patch works for me. It reduces to 256, leaving the DF bit set for
 >> MTU 256, but does not reduce further and will clear DF below 256.
 >
 > Good. Could you send the test output of your tool again showing that
 > is working fine now?
 
 The first two were run back to back.  The third was run after I rebooted 
 the patched machine.
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 1460, result: pmtud-success
   [  0.100] TX SYN       44  seq = 0:0             9c7e
   [  0.101] RX SYN/ACK   44  seq = 0:1             000c
   [  0.149] TX           40  seq = 1:1             9c7f
   [  0.213] TX          225  seq = 1:1(185)        9c80 DF
   [  0.250] RX          372  seq = 1:186(332)      000d DF
   [  0.251] TX           40  seq = 186:333         9c81
   [  0.254] RX         1500  seq = 333:186(1460)   000e DF
   [  0.299] TX PTB       56  mtu = 576
   [  0.299] RX          576  seq = 333:186(536)    000f DF
   [  0.299] RX          576  seq = 869:186(536)    0010 DF
   [  0.299] RX          576  seq = 1405:186(536)   0011 DF
   [  0.349] TX           40  seq = 186:869         9c83
   [  0.350] RX          576  seq = 1941:186(536)   0012 DF
   [  0.350] RX          576  seq = 2477:186(536)   0013 DF
   [  0.398] TX FIN       40  seq = 186:1405        9c84
   [  0.399] RX          576  seq = 3013:187(536)   0014 DF
   [  0.399] RX          576  seq = 3549:187(536)   0015 DF
   [  0.449] TX FIN       40  seq = 186:1941        9c85
   [  0.449] RX          576  seq = 4085:187(536)   0016 DF
   [  0.450] RX FIN       45  seq = 4621:187(5)     0017 DF
   [  0.503] TX FIN       40  seq = 186:2477        9c86
   [  0.504] RX FIN       40  seq = 4626:187        0018
   [  0.554] TX FIN       40  seq = 186:3013        9c87
   [  0.555] RX FIN       40  seq = 4626:187        0019
   [  0.598] TX           40  seq = 187:3549        9c88
   [  0.659] TX           40  seq = 187:4085        9c89
   [  0.710] TX           40  seq = 187:4621        9c8a
   [  0.763] TX           40  seq = 187:4627        9c8b
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 536, result: pmtud-cleardf
   [  0.056] TX SYN       44  seq = 0:0             f696
   [  0.056] RX SYN/ACK   44  seq = 0:1             001a
   [  0.106] TX           40  seq = 1:1             f697
   [  0.149] TX          225  seq = 1:1(185)        f698 DF
   [  0.150] RX          576  seq = 1:186(536)      001b DF
   [  0.210] TX PTB       56  mtu = 256
   [  0.211] RX          256  seq = 1:186(216)      001c
   [  0.211] RX          256  seq = 217:186(216)    001d
   [  0.211] RX          145  seq = 433:186(105)    001e
   [  0.249] TX           40  seq = 186:217         f69a
   [  0.250] RX          256  seq = 538:186(216)    001f
   [  0.250] RX          256  seq = 754:186(216)    0020
   [  0.298] TX FIN       40  seq = 186:433         f69b
   [  0.299] RX          256  seq = 970:187(216)    0021
   [  0.299] RX          256  seq = 1186:187(216)   0022
   [  0.366] TX FIN       40  seq = 186:538         f69c
   [  0.367] RX          250  seq = 1402:187(210)   0023
   [  0.398] TX FIN       40  seq = 186:754         f69d
   [  0.399] RX          256  seq = 1612:187(216)   0024
   [  0.399] RX          256  seq = 1828:187(216)   0025
   [  0.451] TX FIN       40  seq = 186:970         f69e
   [  0.451] RX          256  seq = 2044:187(216)   0026
   [  0.451] RX          256  seq = 2260:187(216)   0027
   [  0.505] TX           40  seq = 187:1186        f69f
   [  0.506] RX          256  seq = 2476:187(216)   0028
   [  0.506] RX          256  seq = 2692:187(216)   0029
   [  0.554] TX           40  seq = 187:1402        f6a0
   [  0.555] RX          256  seq = 2908:187(216)   002a
   [  0.555] RX          256  seq = 3124:187(216)   002b
   [  0.599] TX           40  seq = 187:1612        f6a1
   [  0.600] RX          256  seq = 3340:187(216)   002c
   [  0.648] TX           40  seq = 187:1828        f6a2
   [  0.649] RX          256  seq = 3556:187(216)   002d
   [  0.649] RX          256  seq = 3772:187(216)   002e
   [  0.705] TX           40  seq = 187:2044        f6a3
   [  0.706] RX          256  seq = 3988:187(216)   002f
   [  0.706] RX          256  seq = 4204:187(216)   0030
   [  0.750] TX           40  seq = 187:2260        f6a4
   [  0.751] RX FIN      246  seq = 4420:187(206)   0031
   [  0.811] TX           40  seq = 187:2476        f6a5
   [  0.854] TX           40  seq = 187:2692        f6a6
   [  0.899] TX           40  seq = 187:2908        f6a7
   [  0.955] TX           40  seq = 187:3124        f6a8
   [  1.003] TX           40  seq = 187:3340        f6a9
   [  1.048] TX           40  seq = 187:3556        f6aa
   [  1.100] TX           40  seq = 187:3772        f6ab
   [  1.155] TX           40  seq = 187:3988        f6ac
   [  1.199] TX           40  seq = 187:4204        f6ad
   [  1.249] TX           40  seq = 187:4420        f6ae
   [  1.310] TX           40  seq = 187:4627        f6af
 
 tbit from 192.168.1.11 to 192.168.1.24
   server-mss 1460, result: pmtud-cleardf
   [  0.104] TX SYN       44  seq = 0:0             1e05
   [  0.105] RX SYN/ACK   44  seq = 0:1             000c
   [  0.152] TX           40  seq = 1:1             1e06
   [  0.199] TX          225  seq = 1:1(185)        1e07 DF
   [  0.230] RX          372  seq = 1:186(332)      000d DF
   [  0.251] TX PTB       56  mtu = 255
   [  0.252] RX          256  seq = 1:186(216)      000e
   [  0.252] RX          256  seq = 217:186(216)    000f
   [  0.252] RX          256  seq = 433:186(216)    0010
   [  0.252] RX          256  seq = 649:186(216)    0011
   [  0.252] RX          256  seq = 865:186(216)    0012
   [  0.252] RX          256  seq = 1081:186(216)   0013
   [  0.300] TX           40  seq = 186:217         1e09
   [  0.301] RX          256  seq = 1297:186(216)   0014
   [  0.301] RX          256  seq = 1513:186(216)   0015
   [  0.347] TX FIN       40  seq = 186:433         1e0a
   [  0.348] RX          256  seq = 1729:187(216)   0016
   [  0.348] RX          256  seq = 1945:187(216)   0017
   [  0.400] TX FIN       40  seq = 186:649         1e0b
   [  0.401] RX          256  seq = 2161:187(216)   0018
   [  0.401] RX          256  seq = 2377:187(216)   0019
   [  0.449] TX FIN       40  seq = 186:865         1e0c
   [  0.449] RX          256  seq = 2593:187(216)   001a
   [  0.449] RX          256  seq = 2809:187(216)   001b
   [  0.498] TX FIN       40  seq = 186:1081        1e0d
   [  0.498] RX          256  seq = 3025:187(216)   001c
   [  0.498] RX          256  seq = 3241:187(216)   001d
   [  0.549] TX FIN       40  seq = 186:1297        1e0e
   [  0.550] RX          256  seq = 3457:187(216)   001e
   [  0.550] RX          256  seq = 3673:187(216)   001f
   [  0.597] TX FIN       40  seq = 186:1513        1e0f
   [  0.598] RX          256  seq = 3889:187(216)   0020
   [  0.598] RX          256  seq = 4105:187(216)   0021
   [  0.648] TX FIN       40  seq = 186:1729        1e10
   [  0.650] RX          256  seq = 4321:187(216)   0022
   [  0.650] RX FIN      129  seq = 4537:187(89)    0023
   [  0.699] TX           40  seq = 187:1945        1e11
   [  0.749] TX           40  seq = 187:2161        1e12
   [  0.804] TX           40  seq = 187:2377        1e13
   [  0.848] TX           40  seq = 187:2593        1e14
   [  0.898] TX           40  seq = 187:2809        1e15
   [  0.948] TX           40  seq = 187:3025        1e16
   [  0.998] TX           40  seq = 187:3241        1e17
   [  1.048] TX           40  seq = 187:3457        1e18
   [  1.099] TX           40  seq = 187:3673        1e19
   [  1.151] TX           40  seq = 187:3889        1e1a
   [  1.198] TX           40  seq = 187:4105        1e1b
   [  1.249] TX           40  seq = 187:4321        1e1c
   [  1.298] TX           40  seq = 187:4537        1e1d
   [  1.355] TX           40  seq = 187:4627        1e1e
State-Changed-From-To: patched->closed 
State-Changed-By: andre 
State-Changed-When: Mon Aug 23 14:24:43 UTC 2010 
State-Changed-Why:  
All MFC's done. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146628 
>Unformatted:
