From nobody@FreeBSD.org  Tue Nov 13 21:50:12 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 70F6616A41A
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 13 Nov 2007 21:50:12 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 66A6413C50E
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 13 Nov 2007 21:50:12 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.1/8.14.1) with ESMTP id lADLnhVd063057
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 13 Nov 2007 21:49:43 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.1/8.14.1/Submit) id lADLng0A063056;
	Tue, 13 Nov 2007 21:49:42 GMT
	(envelope-from nobody)
Message-Id: <200711132149.lADLng0A063056@www.freebsd.org>
Date: Tue, 13 Nov 2007 21:49:42 GMT
From: Nikolay Govoruha <bardano@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [PATCH]
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         118026
>Category:       kern
>Synopsis:       [netinet] [patch] MTU field not being set in certain cases with IPSEC
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bz
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 13 22:00:03 UTC 2007
>Closed-Date:    Sat Dec 29 21:52:58 UTC 2007
>Last-Modified:  Sat Dec 29 21:52:58 UTC 2007
>Originator:     Nikolay Govoruha
>Release:        FreeBSD 6.2 Release
>Organization:
VITAL
>Environment:
FreeBSD plant.vital.dp.ua 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Tue Nov 13 00:26:02 UTC 2007     root@plant.vital.dp.ua:/usr/src/sys/i386/compile/VITAL  i386
>Description:
It's a bug in the Path MTU Discovery technique - RFC1191.  When IPSEC
option is turned on in the kernel configuration file the following
behaviour is present.  One host tries to send an IP packet to other with
size=1500 and DF (Do Not Fragment) bit set.  Gateway - FreeBSD 6.2 Release -
has a route for this packet with mtu=1408. net.inet.tcp.path_mtu_discovery:
1.  Gateway can not transmit the packet to another gateway in this case.
As an answer, Gateway sends an icmp packet to sender with type =
ICMP_UNREACH (0x03) and code = ICMP_UNREACH_NEEDFRAG (0x04).  But! Gateway
does not set the mtu field in the packet. This field = 0x0000. tcpdump:

//*****************************************************************************

pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
09:42:39.379247 IP (tos 0x0, ttl  63, id 23385, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
        IP (tos 0x0, ttl 126, id 60516, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c1 (->2ff3)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 5b59 4000 3f01 05a0 505d 761e
        0x0010:  0a00 0a51 0304 9eab 0000 0000 4500 05d4
        0x0020:  ec64 4000 7e06 34c1 0a00 0a51 505d 761e
        0x0030:  0669 152d 97bf a62c
09:42:39.379644 IP (tos 0x0, ttl  63, id 23386, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag, length 36
        IP (tos 0x0, ttl 126, id 60517, offset 0, flags [DF], proto: TCP (6), length: 1492, bad cksum 34c0 (->2ff2)!) 10.0.10.81.1641 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 5b5a 4000 3f01 059f 505d 761e
        0x0010:  0a00 0a51 0304 98ff 0000 0000 4500 05d4
        0x0020:  ec65 4000 7e06 34c0 0a00 0a51 505d 761e
        0x0030:  0669 152d 97bf abd8

//*****************************************************************************

>How-To-Repeat:
Try to use FTP connection for file transfer and see tcpdump - field
"next hop mtu" - RFC1191.
>Fix:
I made the following patch to sys/netinet/ip_input.c and rebuild the kernel.

Original - Line 1948:

//*****************************************************************************

	case EMSGSIZE:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_NEEDFRAG;
#if defined(IPSEC) || defined(FAST_IPSEC)
		/*
		 * If the packet is routed over IPsec tunnel, tell the
		 * originator the tunnel MTU.
		 *	tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz
		 * XXX quickhack!!!
		 */
		{
			struct secpolicy *sp = NULL;
			int ipsecerror;
			int ipsechdr;
			struct route *ro;

#ifdef IPSEC
			sp = ipsec4_getpolicybyaddr(mcopy,
						    IPSEC_DIR_OUTBOUND,
						    IP_FORWARDING,
						    &ipsecerror);
#else /* FAST_IPSEC */
			sp = ipsec_getpolicybyaddr(mcopy,
						   IPSEC_DIR_OUTBOUND,
						   IP_FORWARDING,
						   &ipsecerror);
#endif
			if (sp != NULL) {
				/* count IPsec header size */
				ipsechdr = ipsec4_hdrsiz(mcopy,
							 IPSEC_DIR_OUTBOUND,
							 NULL);

				/*
				 * find the correct route for outer IPv4
				 * header, compute tunnel MTU.
				 */
				if (sp->req != NULL
				 && sp->req->sav != NULL
				 && sp->req->sav->sah != NULL) {
					ro = &sp->req->sav->sah->sa_route;
					if (ro->ro_rt && ro->ro_rt->rt_ifp) {
						mtu =
						    ro->ro_rt->rt_rmx.rmx_mtu ?
						    ro->ro_rt->rt_rmx.rmx_mtu :
						    ro->ro_rt->rt_ifp->if_mtu;
						mtu -= ipsechdr;
					}
				}

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				ipstat.ips_cantfrag++;
				break;
			}
		}
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * If the MTU wasn't set before use the interface mtu or
		 * fall back to the next smaller mtu step compared to the
		 * current packet size.
		 */
		if (mtu == 0) {
			if (ia != NULL)
				mtu = ia->ia_ifp->if_mtu;
			else
				mtu = ip_next_mtu(ip->ip_len, 0);
		}
		ipstat.ips_cantfrag++;
		break;

//*****************************************************************************

I used the printf() function to debug the problem.  In my kernel was
defined IPSEC.  In my case sp = ipsec4_getpolicybyaddr(......) returned
non-NULL value.  But sp->req was NULL.  In this case "if (sp != NULL){}"
statement is executed, but mtu do not calculated, mtu stays equal zero,
and at the end of the "if (sp != NULL){}" statement "break;" statement
is present.  So mtu stays equal zero and after "switch (error)" statement
zero get to the "mtu" field to the icmp packet.

Is it a bug?

I resolved this problem by the following way:

//*****************************************************************************

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				//ipstat.ips_cantfrag++;
				//break;
			}
		}
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * If the MTU wasn't set before use the interface mtu or
		 * fall back to the next smaller mtu step compared to the
		 * current packet size.
		 */
		if (mtu == 0) {
			if (ia != NULL)
				mtu = ia->ia_ifp->if_mtu;
			else
				mtu = ip_next_mtu(ip->ip_len, 0);
		}
		ipstat.ips_cantfrag++;
		break;

//*****************************************************************************

By comment the "break" statement and previous statement. In this case
if mtu stays equal zero the following code is executed - the code that
always executed when IPSEC and FAST_IPSEC are not defined. The tcpdump
result:

//*****************************************************************************

pvs# tcpdump -i rl1 -vv -x icmp
tcpdump: listening on rl1, link-type EN10MB (Ethernet), capture size 96 bytes
12:13:48.471242 IP (tos 0x0, ttl  63, id 20521, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
        IP (tos 0x0, ttl 126, id 50667, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b32 (->5664)!) 10.0.10.81.1769 > 80.93.118.30.5421:  tcp 1476 [bad hdr length 4 - too short, < 20]
        0x0000:  4500 0038 5029 4000 3f01 10d0 505d 761e
        0x0010:  0a00 0a51 0304 7408 0000 0580 4500 05dc
        0x0020:  c5eb 4000 7e06 5b32 0a00 0a51 505d 761e
        0x0030:  06e9 152d 5af0 079f
12:13:48.471583 IP (tos 0x0, ttl  63, id 20522, offset 0, flags [DF], proto: ICMP (1), length: 56) 80.93.118.30 > 10.0.10.81: ICMP 80.93.118.30 unreachable - need to frag (mtu 1408), length 36
        IP (tos 0x0, ttl 126, id 50668, offset 0, flags [DF], proto: TCP (6), length: 1500, bad cksum 5b31 (->5663)!) 10.0.10.81.1769 > 80.93.118.30.5421: [|tcp]
        0x0000:  4500 0038 502a 4000 3f01 10cf 505d 761e
        0x0010:  0a00 0a51 0304 6e54 0000 0580 4500 05dc
        0x0020:  c5ec 4000 7e06 5b31 0a00 0a51 505d 761e
        0x0030:  06e9 152d 5af0 0d53

//*****************************************************************************

Yo see "next hop mtu" field has correct value - 0x0580 = 1408 decimal.

Tell please, is this patch correct?

mailto:bardano@gmail.com

P.S. "bad cksum 5b31 (->5663)!) " it's a packet after natd, may be I have some incorrect natd configuration.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->kmacy 
Responsible-Changed-By: kmacy 
Responsible-Changed-When: Fri Nov 16 02:58:39 UTC 2007 
Responsible-Changed-Why:  

assign to self to keep track 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118026 
Responsible-Changed-From-To: kmacy->bz 
Responsible-Changed-By: bz 
Responsible-Changed-When: Sat Dec 29 00:01:17 UTC 2007 
Responsible-Changed-Why:  
Pinch this. I have just been reading that code. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118026 

From: "Bjoern A. Zeeb" <bz@FreeBSD.org>
To: bug-followup@FreeBSD.org, bardano@gmail.com
Cc:  
Subject: Re: kern/118026: [netinet] [patch] MTU field not being set in certain
 cases with IPSEC
Date: Sat, 29 Dec 2007 00:15:34 +0000 (UTC)

 Hi,
 
 to my understanding your patches just removed the statistics and break
 from the IPSEC block?
 
 That problem was fixed 22 months ago with revision 1.313 of ip_input.c
 and the new ip_ipsec.c to my reading and should be gone with RELENG_7
 or HEAD. Can you confirm this?
 
 The changes were never MFCed to RELENG_6 so they still exist there and
 it'll be too late for the upcoming releng_6 release I guess but if you
 can send a unified diff or confirm that you basically just commented
 out the break I'll handle that.
 
 
 /bz
 
 -- 
 Bjoern A. Zeeb                                 bzeeb at Zabbadoz dot NeT
 Software is harder than hardware  so better get it right the first time.
State-Changed-From-To: open->closed 
State-Changed-By: bz 
State-Changed-When: Sat Dec 29 21:51:24 UTC 2007 
State-Changed-Why:  
Duplicate of kern/91412 which was fixed just after 6.2-Release with 
this commit: 

------------------------------------------------------------------------ 
bz 2006-11-19 10:07:08 UTC 

FreeBSD src repository 

Modified files: (Branch: RELENG_6) 
sys/netinet ip_input.c 
Log: 
Fix PMTU discovery in IPsec case by using an MTU hint in ICMP unreachable 
fragmentation needed other then 0 when we cannot get a security policy. 
This changes the code path to match what we have had in HEAD since 
rev. 1.312. 

PR: kern/91412 
Submitted by: Tom Judge <tom tomjudge.com> 

Revision Changes Path 
1.301.2.11 +0 -2 src/sys/netinet/ip_input.c 
------------------------------------------------------------------------ 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118026 
>Unformatted:
My own reading of the code leads me to believe that we don't try to re-evaluate the pmtu often enough. Assign to self 
to prevent losing track.
