From Dominic@indigo-ic.co.uk  Thu Sep 12 02:52:21 2002
Return-Path: <Dominic@indigo-ic.co.uk>
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0193637B400
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 12 Sep 2002 02:52:21 -0700 (PDT)
Received: from blueyonder.co.uk (pcow053o.blueyonder.co.uk [195.188.53.96])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B7EAB43E3B
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 12 Sep 2002 02:52:14 -0700 (PDT)
	(envelope-from Dominic@indigo-ic.co.uk)
Received: from pcow053o.blueyonder.co.uk ([127.0.0.1]) by blueyonder.co.uk  with Microsoft SMTPSVC(5.5.1877.757.75);
	 Thu, 12 Sep 2002 10:51:54 +0100
Received: from dom.Indigo-IC.co.uk (unverified [62.31.234.90]) by pcow053o.blueyonder.co.uk
 (Content Technologies SMTPRS 4.2.9) with ESMTP id <T5d4a1e7991ac1785c2149@pcow053o.blueyonder.co.uk> for <FreeBSD-gnats-submit@freebsd.org>;
 Thu, 12 Sep 2002 10:44:29 +0100
Message-Id: <5.1.0.14.2.20020912101640.00a04da0@pop.ntlworld.com>
Date: Thu, 12 Sep 2002 10:44:26 +0100
From: Dominic Froud <Dominic@Indigo-IC.co.uk>
To: FreeBSD-gnats-submit@freebsd.org
Subject: [PATCH] Wrong MTU in ICMP using IPSEC tunnels w/out GIF

>Number:         42689
>Category:       kern
>Synopsis:       [PATCH] Wrong MTU in ICMP using IPSEC tunnels w/out GIF
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 12 03:00:19 PDT 2002
>Closed-Date:    Thu Sep 19 11:14:30 PDT 2002
>Last-Modified:  Thu Sep 19 11:14:30 PDT 2002
>Originator:     Dominic Froud
>Release:        FreeBSD 4.6-RELEASE i386
>Organization:
>Environment:

 System: FreeBSD the-mayor.dom 4.6-RELEASE FreeBSD 4.6-RELEASE #17: Wed Sep 
 11 17
 :13:53 BST 2002 root@the-mayor.dom:/usr/src/sys/compile/SERVER i386
 
 Kernel options:
 INET
 INET6
 IPSEC
 IPSEC_ESP
 IPSEC_DEBUG
 MROUTING
 IPFIREWALL
 IPFIREWALL_FORWARD
 IPDIVERT
 RANDOM_IP_ID
 ICMP_BANDLIM
 
 Server has two Macronix 98715AEC-C 10/100BaseTX cards at dc0 and dc1.
 
 net.inet.ipsec.dfbit=1

>Description:

 I bridged my LAN (subnet 10.0.1.0/24) with a friend's LAN (10.0.0.0/24) 
 using IPSEC tunnels without GIF devices. I use FreeBSD 4.6 and he uses 
 Linux RedHat 7.x. My friend couldn't pull any packets from machines on my 
 LAN that required MTU reduction to prevent fragmentation, e.g. SMB TCP 
 packets. Upon further inspection, my FreeBSD server was telling the machine 
 on my LAN that fragmentation was needed but was suggesting an incorrect MTU 
 of 1500 instead of one that took the IPSEC tunnel headers into account. 
 This would cause the machine on my LAN to simply retry the same over-sized 
 packet again and again, causing the requesting machine on his LAN to 
 eventually timeout with a short read. [The short read timeout problem is a 
 common symptom of other MTU issues but this specific issue can be 
 accurately diagnosed].
 
 There is code in netinet/ip_input.c:ip_forward() that should deal with this 
 but it never has the chance to do the calculation as some prior IPSEC 
 function call returns an error. In ip_forward(), before ip_output() called, 
 a rough copy of the top mbuf at 'm' is made and pointed to by 'mcopy'. Only 
 the IP header and up to 8 bytes are copied - but the length as stored in 
 the packet header (m_pkthdr) remains unchanged and reflects the original 
 packet length.
 
 If ip_forward()'s call to ip_output() fails with EMSGSIZE and the packet 
 would have transversed an IPSEC tunnel, then ipsec_setspidx() in 
 netinet6/ipsec.c would (eventually) be called. This would sanity check the 
 passed mbuf and fail with an error like: "ipsec_setspidx: total of 
 m_len(28) != pkthdr.len(1500), ignored."
 
 The 28 is obviously the truncated length of mcopy (IP header + max 8 bytes) 
 and the 1500 is the size of the original packet. Hence the rest of the 
 reduced MTU calculation would be stopped at this point and an unchanged MTU 
 used to construct the ICMP frag-needed packet.
 
  >How-To-Repeat:
 Bridge two subnets using IPSEC tunnels without the GIF device. If you 
 bridge the encapsulating machines themselves as well, you should end up 
 with 8 policies like the following:
 
 81.5.133.243[any] 10.0.1.0/24[any] any
          in ipsec
          esp/tunnel/81.5.133.243-62.31.234.90/require
          spid=1 seq=7 pid=235
          refcnt=1
 81.5.133.243[any] 62.31.234.90[any] any
          in ipsec
          esp/tunnel/81.5.133.243-62.31.234.90/require
          spid=3 seq=6 pid=235
          refcnt=1
 10.0.0.0/24[any] 10.0.1.0/24[any] any
          in ipsec
          esp/tunnel/81.5.133.243-62.31.234.90/require
          spid=5 seq=5 pid=235
          refcnt=1
 10.0.0.0/24[any] 62.31.234.90[any] any
          in ipsec
          esp/tunnel/81.5.133.243-62.31.234.90/require
          spid=7 seq=4 pid=235
          refcnt=1
 10.0.1.0/24[any] 81.5.133.243[any] any
          out ipsec
          esp/tunnel/62.31.234.90-81.5.133.243/require
          spid=2 seq=3 pid=235
          refcnt=1
 62.31.234.90[any] 81.5.133.243[any] any
          out ipsec
          esp/tunnel/62.31.234.90-81.5.133.243/require
          spid=4 seq=2 pid=235
          refcnt=1
 10.0.1.0/24[any] 10.0.0.0/24[any] any
          out ipsec
          esp/tunnel/62.31.234.90-81.5.133.243/require
          spid=6 seq=1 pid=235
          refcnt=1
 62.31.234.90[any] 10.0.0.0/24[any] any
          out ipsec
          esp/tunnel/62.31.234.90-81.5.133.243/require
          spid=8 seq=0 pid=235
          refcnt=1
 
 I am 62.31.234.90 with protected subnet 10.0.1.0/24.
 Peer is 81.5.133.243 with protected subnet 10.0.0.0/24.
 
 I also have net.inet.ipsec.dfbit set to 1 via /etc/sysctl.conf.
 
 I logged into peer's server and used smbclient to request a file from 
 10.0.1.20 (win98se machine). Just each test, make sure all your SAD entries 
 are 'mature' and relatively fresh (i.e. not about to die on you during your 
 test) using "setkey -D | egrep '(diff|state)'".
 
 Use tcpdump to log data packets from, and icmp packets to, your protected 
 host (in my case this was 10.0.1.20). Increase IPSEC logging using "sysctl 
 net.key.debug=0x45". To turn these messages off, use "sysctl net.key.debug=0".
 
 Now try to transfer a file from your target host that is bigger than your 
 MTU (>1500 so say, 16Kbytes).
 
 tcpdump will produce output like:
 
 11:44:03.378193 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, 
 id 26226, len 1500)
 11:44:03.387030 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need 
 to frag (mtu 1500) (DF) (ttl 64, id 48070, len 56)
 11:44:04.778191 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, 
 id 26226, len 1500)
 11:44:04.787022 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need 
 to frag (mtu 1500) (DF) (ttl 64, id 48070, len 56)
 (pattern repeats)
 
 Your console should show lines like:
 Sep 10 11:44:03 the-mayor /kernel: ipsec_setspidx: total of m_len(28) != 
 pkthdr.len(1500), ignored.
 
 The requesting host on the remote LAN will timeout.
 
  >Fix:
 Simply update the packet length in mcopy->m_pkthdr.len to reflect the 
 truncated nature of mcopy. This can be done at line 1799 in 
 netinet/ip_input.c rev 1.130.2.35 for just the EMSGSIZE/IPSEC case or at 
 line 1703 if this is of more general use within ip_forward() and functions 
 called by it. I've tried the following diff at both line 1703 and line 1799 
 and both cure the problem as expected. On my machine, I've left the code in 
 at line 1799 because I don't know if other code using mcopy makes use of 
 the original packet length.
 
 
 --- patch begins here ---
 --- ip_input.c  Wed Sep 11 17:55:09 2002
 +++ ip_input.c-patched  Wed Sep 11 18:23:47 2002
 @@ -1796,6 +1796,13 @@
                          int ipsechdr;
                          struct route *ro;
 
 +                       /* Pretend original packet was only this long as
 +                        * IPSEC functions like ipsec_setspidx(), called by
 +                        * ispec4_getpolicybyaddr() below for the EMSGSIZE 
 case,
 +                        * expect a sane mbuf chain.
 +                        */
 +                       mcopy->m_pkthdr.len = mcopy->m_len;
 +
                          sp = ipsec4_getpolicybyaddr(mcopy,
                                                      IPSEC_DIR_OUTBOUND,
                                                      IP_FORWARDING,
 --- patch ends here ---

 tcpdump with patched kernel looks like:

 17:17:43.108193 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, id 26226, len 1500)
 17:17:43.108779 10.0.1.20.139 > 81.5.133.243.43396: tcp 652 (DF) (ttl 128, id 26482, len 692)
 17:17:43.114394 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need to frag (mtu 1443) (DF) (ttl 64, id 39851, len 56)
 17:17:43.115869 10.0.1.20.139 > 81.5.133.243.43396: tcp 1403 (DF) (ttl 128, id 26738, len 1443)

>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: gnats-admin->freebsd-bugs 
Responsible-Changed-By: keramida 
Responsible-Changed-When: Thu Sep 19 10:39:37 PDT 2002 
Responsible-Changed-Why:  
Refile misfiled PR under kern/* and assign to freebsd-bugs. 

To the originator: 
When filling up the fields of the send-pr template, please only append 
text to the single-line field lines that start with '>' characters. 
Do not fill, do not justify, or ident those lines. 

Thanks for your submission :-) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42689 
State-Changed-From-To: open->closed 
State-Changed-By: maxim 
State-Changed-When: Thu Sep 19 11:13:35 PDT 2002 
State-Changed-Why:  
Duplicate of kern/42727. Closed at submitter's request. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42689 
>Unformatted:
