From nobody@FreeBSD.org  Fri Apr 30 16:35:41 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6196D106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 30 Apr 2010 16:35:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id 50AA58FC1A
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 30 Apr 2010 16:35:41 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o3UGZegN023588
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 30 Apr 2010 16:35:40 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o3UGZeQZ023587;
	Fri, 30 Apr 2010 16:35:40 GMT
	(envelope-from nobody)
Message-Id: <201004301635.o3UGZeQZ023587@www.freebsd.org>
Date: Fri, 30 Apr 2010 16:35:40 GMT
From: Denis Antrushin <DAntrushin@mail.ru>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ipsec][patch] NAT traversal does not work in transport mode
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         146190
>Category:       kern
>Synopsis:       [ipsec][patch] NAT traversal does not work in transport mode
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    vanhu
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 30 16:40:01 UTC 2010
>Closed-Date:    
>Last-Modified:  Fri May 14 16:40:01 UTC 2010
>Originator:     Denis Antrushin
>Release:        8.0-STABLE
>Organization:
>Environment:
FreeBSD XXX 8.0-STABLE FreeBSD 8.0-STABLE #3: Fri Apr 30 13:12:43 MSD 2010     adu@XXX:/usr/obj/usr/src/sys/ADU_NB  amd64
>Description:
IPSEC NAT-Traversal does not work in transport mode.
It requires fixing of TCP/UDP checksums of packets protected by ESP
(or ignoring checksum mismatches), which is not implemented in kernel.
>How-To-Repeat:
Try to setup IPSEC transport mode connection between two hosts with NAT box
in the middle. Observe incoming packets dropped by kernel due to checksum
mismatch 
>Fix:
There are two ways to handle this issue: recalculate TCP/UDP checksums
using NAT-OA information from IKE exchange or just ignore checksums of
packets protected by ESP.
The former case requires support from IKED daemon.

Attached prototype patch implements both cases.
I didn't tried isakmpd, but racoon currently does not send NAT-OAi/NAT-OAr
info to the kernel, so without racoon patching, only ignoring of checksum
mismatch is available (put under sysctl control in the patch)
 

Patch attached with submission follows:

--- esp_var.h.orig	2010-04-30 19:42:06.000000000 +0400
+++ esp_var.h	2010-04-30 12:12:23.000000000 +0400
@@ -76,5 +76,7 @@
 #define	V_esp_enable	VNET(esp_enable)
 VNET_DECLARE(struct espstat, espstat);
 #define	V_espstat	VNET(espstat)
+VNET_DECLARE(int, esp_ignore_natt_cksum);
+#define V_esp_ignore_natt_cksum	    VNET(esp_ignore_natt_cksum)
 #endif /* _KERNEL */
 #endif /*_NETIPSEC_ESP_VAR_H_*/
--- ipsec.c.orig	2010-04-30 19:42:35.000000000 +0400
+++ ipsec.c	2010-04-30 13:11:12.000000000 +0400
@@ -592,7 +592,7 @@
 	IPSEC_ASSERT(m->m_pkthdr.len >= sizeof(struct ip),("packet too short"));
 
 	/* NB: ip_input() flips it into host endian. XXX Need more checking. */
-	if (m->m_len < sizeof (struct ip)) {
+	if (m->m_len >= sizeof (struct ip)) {
 		struct ip *ip = mtod(m, struct ip *);
 		if (ip->ip_off & (IP_MF | IP_OFFMASK))
 			goto done;
--- ipsec_input.c.orig	2009-08-03 12:13:06.000000000 +0400
+++ ipsec_input.c	2010-04-30 12:23:24.000000000 +0400
@@ -76,6 +76,11 @@
 #include <netinet/icmp6.h>
 #endif
 
+#ifdef IPSEC_NAT_T
+#include <netinet/tcp.h>
+#include <netinet/udp.h>
+#endif
+
 #include <netipsec/ipsec.h>
 #ifdef INET6
 #include <netipsec/ipsec6.h>
@@ -347,6 +352,34 @@
 	}
 	prot = ip->ip_p;
 
+#ifdef IPSEC_NAT_T
+	if (saidx->mode == IPSEC_MODE_TRANSPORT && sproto == IPPROTO_ESP &&
+	    sav->natt_cksum != 0) {
+		if (V_esp_ignore_natt_cksum != 0) {
+			/* Ignore checksum of packet protected by ESP.  */
+			if (prot == IPPROTO_TCP || prot == IPPROTO_UDP) {
+				m->m_pkthdr.csum_flags |= (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
+				m->m_pkthdr.csum_data = 0xffff;
+
+			}
+		} else {
+			if (prot == IPPROTO_TCP || prot == IPPROTO_UDP) {
+				u_int16_t proto_cksum;
+				int off = sizeof(struct ip);
+				if (prot == IPPROTO_TCP) {
+					off += offsetof(struct tcphdr, th_sum);
+				} else if (prot == IPPROTO_UDP) {
+					off += offsetof(struct udphdr, uh_sum);
+				}
+				m_copydata(m, off, sizeof(u_int16_t), (caddr_t)&proto_cksum);
+				proto_cksum = in_addword(sav->natt_cksum, ~ntohs(proto_cksum));
+				proto_cksum = ~htons(proto_cksum);
+				m_copyback(m, off, sizeof(u_int16_t), (caddr_t)&proto_cksum);
+			}
+		}
+	}
+#endif
+
 #ifdef notyet
 	/* IP-in-IP encapsulation */
 	if (prot == IPPROTO_IPIP) {
--- key.c.orig	2009-08-03 12:13:06.000000000 +0400
+++ key.c	2010-04-30 12:09:55.000000000 +0400
@@ -459,6 +459,8 @@
 #ifdef IPSEC_NAT_T
 static struct mbuf *key_setsadbxport(u_int16_t, u_int16_t);
 static struct mbuf *key_setsadbxtype(u_int16_t);
+static u_int16_t key_compute_natt_cksum(struct sockaddr*, 
+	struct sockaddr*, struct sockaddr*, struct sockaddr*);
 #endif
 static void key_porttosaddr(struct sockaddr *, u_int16_t);
 #define	KEY_PORTTOSADDR(saddr, port)				\
@@ -3083,6 +3085,7 @@
 	/*  Initialize even if NAT-T not compiled in: */
 	sav->natt_type = 0;
 	sav->natt_esp_frag_len = 0;
+	sav->natt_cksum = 0;
 
 	/* SA */
 	if (mhp->ext[SADB_EXT_SA] != NULL) {
@@ -3505,7 +3508,19 @@
 			break;
 
 		case SADB_X_EXT_NAT_T_OAI:
+			m = key_setsadbaddr(SADB_X_EXT_NAT_T_OAI,
+			    &sav->natt_oa_src.sa,
+			    FULLMASK, IPSEC_ULPROTO_ANY);
+			if (!m)
+				goto fail;
+			break;
 		case SADB_X_EXT_NAT_T_OAR:
+			m = key_setsadbaddr(SADB_X_EXT_NAT_T_OAR,
+			    &sav->natt_oa_dst.sa,
+			    FULLMASK, IPSEC_ULPROTO_ANY);
+			if (!m)
+				goto fail;
+			break;
 		case SADB_X_EXT_NAT_T_FRAG:
 			/* We do not (yet) support those. */
 			continue;
@@ -3786,6 +3801,56 @@
 			__func__, sa->sa_family));
 	return (0);
 }
+
+/* 
+ * Compute checksum delta to be applied to incoming TCP/UDP packet
+ * after packet has been decrypted
+ */
+static u_int16_t
+key_compute_natt_cksum(struct sockaddr *src, struct sockaddr *dst,
+	struct sockaddr *natt_src, struct sockaddr *natt_dst)
+{
+	u_int32_t total_sum = 0;
+	u_int32_t sum_old, sum_new;
+	if (natt_src && key_sockaddrcmp(src, natt_src, 0)) {
+		IPSEC_ASSERT(src->sa.sa_family == AF_INET, ("bad address family"));
+		sum_old = *(u_int32_t*)(&((struct sockaddr_in*)src)->sin_addr);
+		sum_old = ntohl(sum_old);
+		sum_old = (sum_old & 0xFFFF) + (sum_old >> 16);
+		sum_old = (sum_old & 0xFFFF) + (sum_old >> 16);
+
+		sum_new = *(u_int32_t*)(&((struct sockaddr_in*)natt_src)->sin_addr);
+		sum_new = ntohl(sum_new);
+		sum_new = (sum_new & 0xFFFF) + (sum_new >> 16);
+		sum_new = (sum_new & 0xFFFF) + (sum_new >> 16);
+
+		if (sum_new < sum_old)
+			sum_new--;
+
+		total_sum += sum_new - sum_old;
+	}
+	if (natt_dst && key_sockaddrcmp(dst, natt_dst, 0)) {
+		IPSEC_ASSERT(dst->sa.sa_family == AF_INET, ("bad address family"));
+		sum_old = *(u_int32_t*)(&((struct sockaddr_in*)natt_dst)->sin_addr);
+		sum_old = ntohl(sum_old);
+		sum_old = (sum_old & 0xFFFF) + (sum_old >> 16);
+		sum_old = (sum_old & 0xFFFF) + (sum_old >> 16);
+
+		sum_new = *(u_int32_t*)(&((struct sockaddr_in*)dst)->sin_addr);
+		sum_new = ntohl(sum_new);
+		sum_new = (sum_new & 0xFFFF) + (sum_new >> 16);
+		sum_new = (sum_new & 0xFFFF) + (sum_new >> 16);
+
+		if (sum_new < sum_old)
+			sum_new--;
+
+		total_sum += sum_new - sum_old;
+	}
+	total_sum = (total_sum & 0xFFFF) + (total_sum >> 16);
+	total_sum = (total_sum & 0xFFFF) + (total_sum >> 16);
+	return (u_int16_t)total_sum;
+}
+
 #endif /* IPSEC_NAT_T */
 
 /*
@@ -4656,7 +4721,7 @@
 	struct mbuf *m;
 	const struct sadb_msghdr *mhp;
 {
-	struct sadb_address *src0, *dst0;
+	struct sadb_address *src0, *dst0, *iaddr, *raddr;
 	struct secasindex saidx;
 	struct secashead *newsah;
 	struct secasvar *newsav;
@@ -4747,10 +4812,24 @@
 	 * We made sure the port numbers are zero above, so we do
 	 * not have to worry in case we do not update them.
 	 */
-	if (mhp->ext[SADB_X_EXT_NAT_T_OAI] != NULL)
+	if (mhp->ext[SADB_X_EXT_NAT_T_OAI] != NULL) {
 		ipseclog((LOG_DEBUG, "%s: NAT-T OAi present\n", __func__));
-	if (mhp->ext[SADB_X_EXT_NAT_T_OAR] != NULL)
+		if (mhp->extlen[SADB_X_EXT_NAT_T_OAI] < sizeof(struct sadb_address)) {
+			ipseclog((LOG_DEBUG, "%s: invalid message is passed.\n",
+			    __func__));
+			return key_senderror(so, m, EINVAL);
+		}
+		iaddr = (struct sadb_address *)(mhp->ext[SADB_X_EXT_NAT_T_OAI]);
+	}
+	if (mhp->ext[SADB_X_EXT_NAT_T_OAR] != NULL) {
 		ipseclog((LOG_DEBUG, "%s: NAT-T OAr present\n", __func__));
+		if (mhp->extlen[SADB_X_EXT_NAT_T_OAR] < sizeof(struct sadb_address)) {
+			ipseclog((LOG_DEBUG, "%s: invalid message is passed.\n",
+			    __func__));
+			return key_senderror(so, m, EINVAL);
+		}
+		raddr = (struct sadb_address *)(mhp->ext[SADB_X_EXT_NAT_T_OAR]);
+	}
 
 	if (mhp->ext[SADB_X_EXT_NAT_T_TYPE] != NULL &&
 	    mhp->ext[SADB_X_EXT_NAT_T_SPORT] != NULL &&
@@ -5081,6 +5160,11 @@
 		iaddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OAI];
 		raddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OAR];
 		ipseclog((LOG_DEBUG, "%s: NAT-T OAi/r present\n", __func__));
+	} else if (mhp->ext[SADB_X_EXT_NAT_T_OA] != NULL) {
+	    iaddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OA];
+	    raddr = NULL;
+	    ipseclog((LOG_DEBUG, "%s: NAT-T OA present\n", __func__));
+
 	} else {
 		iaddr = raddr = NULL;
 	}
@@ -5177,6 +5261,16 @@
 	if (dport)
 		KEY_PORTTOSADDR(&sav->sah->saidx.dst,
 		    dport->sadb_x_nat_t_port_port);
+	if (iaddr)
+		bcopy(iaddr + 1, &sav->natt_oa_src, ((const struct sockaddr *)(iaddr + 1))->sa_len);
+	if (raddr)
+		bcopy(raddr + 1, &sav->natt_oa_dst, ((const struct sockaddr *)(raddr + 1))->sa_len);
+	if (sav->sah->saidx.src.sa.sa_family == AF_INET) {
+		struct sockaddr *natt_src_sa = iaddr ? &sav->natt_oa_src.sa : NULL;
+		struct sockaddr *natt_dst_sa = raddr ? &sav->natt_oa_dst.sa : NULL;
+		sav->natt_cksum = key_compute_natt_cksum(&sav->sah->saidx.src.sa,
+		    &sav->sah->saidx.dst.sa, natt_src_sa, natt_dst_sa);
+	}
 
 #if 0
 	/*
@@ -5377,6 +5471,11 @@
 		iaddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OAI];
 		raddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OAR];
 		ipseclog((LOG_DEBUG, "%s: NAT-T OAi/r present\n", __func__));
+	} else if (mhp->ext[SADB_X_EXT_NAT_T_OA] != NULL) {
+		iaddr = (struct sadb_address *)mhp->ext[SADB_X_EXT_NAT_T_OAI];
+		raddr = NULL;
+		ipseclog((LOG_DEBUG, "%s: NAT-T OA present\n", __func__));
+
 	} else {
 		iaddr = raddr = NULL;
 	}
@@ -5436,6 +5535,16 @@
 	 */
 	if (type)
 		newsav->natt_type = type->sadb_x_nat_t_type_type;
+	if (iaddr)
+		bcopy(iaddr + 1, &newsav->natt_oa_src, ((const struct sockaddr *)(iaddr + 1))->sa_len);
+	if (raddr)
+		bcopy(raddr + 1, &newsav->natt_oa_dst, ((const struct sockaddr *)(raddr + 1))->sa_len);
+	if (newsav->sah->saidx.src.sa.sa_family == AF_INET) {
+		struct sockaddr *natt_src_sa = iaddr ? &newsav->natt_oa_src.sa : NULL;
+		struct sockaddr *natt_dst_sa = raddr ? &newsav->natt_oa_dst.sa : NULL;
+		newsav->natt_cksum = key_compute_natt_cksum(&newsav->sah->saidx.src.sa,
+		    &newsav->sah->saidx.dst.sa, natt_src_sa, natt_dst_sa);
+	}
 
 #if 0
 	/*
--- keydb.h.orig	2009-08-03 12:13:06.000000000 +0400
+++ keydb.h	2010-04-30 12:09:55.000000000 +0400
@@ -157,6 +157,9 @@
 	 */
 	u_int16_t natt_type;		/* IKE/ESP-marker in output. */
 	u_int16_t natt_esp_frag_len;	/* MTU for payload fragmentation. */
+	union sockaddr_union natt_oa_src; /* NATT source address */
+	union sockaddr_union natt_oa_dst; /* NATT destination address */
+	u_int16_t natt_cksum;             /* checksum delta for inbound packets */
 };
 
 #define	SECASVAR_LOCK_INIT(_sav) \
--- xform_esp.c.orig	2010-04-30 19:43:50.000000000 +0400
+++ xform_esp.c	2010-04-30 12:19:36.000000000 +0400
@@ -78,12 +78,16 @@
 
 VNET_DEFINE(int, esp_enable) = 1;
 VNET_DEFINE(struct espstat, espstat);
+VNET_DEFINE(int, esp_ignore_natt_cksum) = 0;
 
 SYSCTL_DECL(_net_inet_esp);
 SYSCTL_VNET_INT(_net_inet_esp, OID_AUTO,
 	esp_enable,	CTLFLAG_RW,	&VNET_NAME(esp_enable),	0, "");
 SYSCTL_VNET_STRUCT(_net_inet_esp, IPSECCTL_STATS,
 	stats,		CTLFLAG_RD,	&VNET_NAME(espstat),	espstat, "");
+SYSCTL_VNET_INT(_net_inet_esp, OID_AUTO,
+	esp_ignore_natt_cksum,	CTLFLAG_RW,	&VNET_NAME(esp_ignore_natt_cksum), 0, 
+	"Do not validate checksums of ESP protected packets in case of NAT-T");
 
 /* max iv length over all algorithms */
 static VNET_DEFINE(int, esp_max_ivlen) = 0;


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Fri Apr 30 19:13:57 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146190 
Responsible-Changed-From-To: freebsd-net->vanhu 
Responsible-Changed-By: vanhu 
Responsible-Changed-When: Mon May 3 07:57:47 UTC 2010 
Responsible-Changed-Why:  
Taking it, I'll also handle userland (racoon) part. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=146190 

From: Denis Antrushin <Denis.Antrushin@Sun.COM>
To: bug-followup@FreeBSD.org, DAntrushin@mail.ru
Cc:  
Subject: Re: kern/146190: [ipsec][patch] NAT traversal does not work in
 transport mode
Date: Thu, 06 May 2010 23:56:36 +0400

 There is one problem with this patch on SMP system: packet's checksum is 
 fixed properly, but somehow source and destination ports in security 
 policy index created from mbuf (ipsec4_get_ulp()) gets corrupted: they
 become 0x4000 and 0x0000 (always the same) instead of actual port 
 numbers. As a result, incoming packet is rejected by kernel
 (my config has few ports configured for IPSEC and rest is denied).
 If I remove 'deny all' rule, IPSEC connection works OK.
 Also, I get couple of debug messages garbled in the log near just before
 bad SP index appears:
 
 kernel: DP key_freesav SA:0xffffff002d5eec00 (SPI 2540512i5p9s)e frcom 
 46/_usirn/_srrecj/escyts:/ nmebtuif =p 
 sec/xf0oxrfmf_fefsfp.fc:00603f3d7e500;,  irnepcfbc n= t now0 
 xf2fffff002d1667e0
 
 On uniprocessor system everything works OK, so this looks like race
 condition, which I don't understand: how the same mbuf could be 
 processed in parallel by two threads? So far I've been unable to
 figure out what's happening here...
 

From: Denis Antrushin <Denis.Antrushin@Sun.COM>
To: bug-followup@FreeBSD.org, DAntrushin@mail.ru
Cc:  
Subject: Re: kern/146190: [ipsec][patch] NAT traversal does not work in
 transport mode
Date: Fri, 14 May 2010 20:34:18 +0400

 Please ignore my previous comment.
 The same issue exists on uniprocessor system as well.
 That bogus ports numbers are, in fact, not from TCP header, but are
 pieces of IP header.
 The problem is that tcp_input() (partially) zeroes out IP header when
 computing TCP checksum and then call IPSEC stuff. But IPSEC uses
 IP header length field to get TCP/UDP port numbers from mbuf
 (ipsec4_get_ulp()). With zero ip_hl field, it access IP header instead
 of TCP or UDP ones.
 
 I don't know how to add new patch to existing PR, so I've put it here:
 http://den.homeunix.org/public_html/freebsd/ipsec_natt.v4.diff
 
 Also, a bit improved userland (racoon) patch is here:
 http://den.homeunix.org/public_html/freebsd/ipsec_tools.context.v2.diff
>Unformatted:
