From nobody@FreeBSD.org  Sat Mar 15 18:50:27 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BD285106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Mar 2008 18:50:27 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id C4EB68FC15
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Mar 2008 18:50:27 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m2FIl3Z8022530
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 15 Mar 2008 18:47:03 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m2FIl3U7022529;
	Sat, 15 Mar 2008 18:47:03 GMT
	(envelope-from nobody)
Message-Id: <200803151847.m2FIl3U7022529@www.freebsd.org>
Date: Sat, 15 Mar 2008 18:47:03 GMT
From: Alexander Zagrebin <alexz@visp.ru>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ipfw in-kernel nat loses fragmented packets
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         121743
>Category:       kern
>Synopsis:       [netinet] [patch] ipfw in-kernel nat loses fragmented packets
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    piso
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Mar 15 19:00:05 UTC 2008
>Closed-Date:    Sun Jun 01 12:46:43 UTC 2008
>Last-Modified:  Sun Jun 01 12:46:43 UTC 2008
>Originator:     Alexander Zagrebin
>Release:        7.0-RELEASE
>Organization:
-
>Environment:
FreeBSD <hidden> 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sat Mar 15 19:18:40 MSK 2008     alex@<hidden>:/usr/src/sys/i386/compile/KERNEL  i386
>Description:
When trying to use ipfw in-kernel nat, I observed, that it loses
fragmented packets (see section "How to repeat the problem").
The problem is in current ipfw code (sys/netinet/ip_fw2.c).
After processing packet with LibAliasIn, ipfw analyses retcode and
drops the packet if retcode != PKT_ALIAS_OK.
But LibAliasIn, when processing the first fragment of the packet, returns
PKT_ALIAS_FOUND_HEADER_FRAGMENT instead. This code is not error, therefore ipfw should consider this fact, when deciding to pass or drop a packet.
(See the patch at "Fix to the problem if known")

Also,
libalias(3) (see section "FRAGMENT HANDLING") suggests the method of fragmented packets processing via LibAliasSaveFragment, LibAliasGetFragment, LibAliasFragmentIn, but neither ipfw nat code, nor user-space natd, doesn't use it now. It can be important, if packet's fragments will reordered during a way.
>How-To-Repeat:
My internal network is 192.168.1.0/24
External (public) network is 10.0.0.0/8
External interface (xl0) has address 10.255.255.2

Add to kernel config:
options         IPFIREWALL
options         IPFIREWALL_NAT
options         LIBALIAS

Add to ipfw config (xl0 is the external interface):
..
nat 1 config if xl0 log reset same_ports
add 999 count log ip from any to any via xl0
add 1000 nat 1 ip from any to any via xl0
add 1001 count log ip from any to any via xl0
..

To log packets after nat:
# sysctl net.inet.ip.fw.one_pass=0

Try to ping the external host (10.0.0.1 in my case) from an internal address,
and use packets with a size greater than MTU:

# ping -S 192.168.1.1 -s 2000 <some_external_host>
PING <some_external_host> from 192.168.1.1: 2000 data bytes
^C
--- 10.0.0.1 ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss

So, ping fails.

See /var/log/security (my comments are marked with >>>):
..
>Fix:
--- sys/netinet/ip_fw2.c.orig   2008-02-28 11:28:09.000000000 +0300
+++ sys/netinet/ip_fw2.c        2008-03-15 18:41:52.000000000 +0300
@@ -3568,7 +3568,8 @@
                                else
                                        retval = LibAliasOut(t->lib, c,
                                            MCLBYTES);
-                               if (retval != PKT_ALIAS_OK) {
+                               if (retval != PKT_ALIAS_OK &&
+                                   retval != PKT_ALIAS_FOUND_HEADER_FRAGMENT) {
                                        /* XXX - should i add some logging? */
                                        m_free(mcl);
                                badnat:

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-ipfw 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Mar 16 04:09:11 UTC 2008 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=121743 

From: Vadim Goncharov <vadim_nuclight@mail.ru>
To: Alexander Zagrebin <alexz@visp.ru>
Cc: bug-followup@freebsd.org
Subject: Re: kern/121743: ipfw in-kernel nat loses fragmented packets
Date: Mon, 17 Mar 2008 15:19:38 +0600

 Hi Alexander Zagrebin! 
 
 On Sat, 15 Mar 2008 18:47:03 GMT; Alexander Zagrebin <alexz@visp.ru> wrote:
 
 >>Fix:
 > --- sys/netinet/ip_fw2.c.orig   2008-02-28 11:28:09.000000000 +0300
 > +++ sys/netinet/ip_fw2.c        2008-03-15 18:41:52.000000000 +0300
 > @@ -3568,7 +3568,8 @@
 >                                 else
 >                                         retval = LibAliasOut(t->lib, c,
 >                                             MCLBYTES);
 > -                               if (retval != PKT_ALIAS_OK) {
 > +                               if (retval != PKT_ALIAS_OK &&
 > +                                   retval != PKT_ALIAS_FOUND_HEADER_FRAGMENT) {
 >                                         /* XXX - should i add some logging? */
 >                                         m_free(mcl);
 >                                 badnat:
 
 This is not so simple to fix as LibAlias API requires caller to save packet
 fragments somewhere and then at some time to feed them all back. And kernel
 infrastructure currently is not so suitable for that packet storage.
 
 As a workaround you can currently send packets with some ipfw rule before NAT
 to a divert socket on wich ng_ksocket listens and returns packets back with
 ng_echo (thus packets won't leave kernel), as divert sockets do packet
 reassembly.
 
 -- 
 WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
 [Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]

From: "Alexander Zagrebin" <alexz@visp.ru>
To: <vadim_nuclight@mail.ru>
Cc: <bug-followup@freebsd.org>
Subject: RE: kern/121743: ipfw in-kernel nat loses fragmented packets
Date: Mon, 17 Mar 2008 14:32:23 +0300

 > On Sat, 15 Mar 2008 18:47:03 GMT; Alexander Zagrebin 
 > <alexz@visp.ru> wrote:
 > 
 > >>Fix:
 > > --- sys/netinet/ip_fw2.c.orig   2008-02-28 11:28:09.000000000 +0300
 > > +++ sys/netinet/ip_fw2.c        2008-03-15 18:41:52.000000000 +0300
 > > @@ -3568,7 +3568,8 @@
 > >                                 else
 > >                                         retval = 
 > LibAliasOut(t->lib, c,
 > >                                             MCLBYTES);
 > > -                               if (retval != PKT_ALIAS_OK) {
 > > +                               if (retval != PKT_ALIAS_OK &&
 > > +                                   retval != 
 > PKT_ALIAS_FOUND_HEADER_FRAGMENT) {
 > >                                         /* XXX - should i 
 > add some logging? */
 > >                                         m_free(mcl);
 > >                                 badnat:
 > 
 > This is not so simple to fix as LibAlias API requires caller 
 > to save packet
 > fragments somewhere and then at some time to feed them all 
 > back. And kernel
 > infrastructure currently is not so suitable for that packet storage.
 
 /sbin/natd doesn't use this method too. But it is in source tree and works.
 This patch will work at most cases.
 It is better to work with a bad patch, than to not work absolutely.
 
 > As a workaround you can currently send packets with some ipfw 
 > rule before NAT
 > to a divert socket on wich ng_ksocket listens and returns 
 > packets back with
 > ng_echo (thus packets won't leave kernel), as divert sockets do packet
 > reassembly.
 
 So ng_ksocket has kernel memory for fragmented packet's buffer, but libalias
 not? :)
 

From: Vadim Goncharov <vadim_nuclight@mail.ru>
To: Alexander Zagrebin <alexz@visp.ru>
Cc: bug-followup@freebsd.org
Subject: Re: kern/121743: ipfw in-kernel nat loses fragmented packets
Date: Mon, 17 Mar 2008 18:23:02 +0600

 Hi Alexander Zagrebin! 
 
 On Mon, 17 Mar 2008 12:10:02 GMT; Alexander Zagrebin <alexz@visp.ru> wrote:
 
 >>> --- sys/netinet/ip_fw2.c.orig   2008-02-28 11:28:09.000000000 +0300
 >>> +++ sys/netinet/ip_fw2.c        2008-03-15 18:41:52.000000000 +0300
 >>> @@ -3568,7 +3568,8 @@
 >>>                                 else
 >>>                                         retval = 
 >> LibAliasOut(t->lib, c,
 >>>                                             MCLBYTES);
 >>> -                               if (retval != PKT_ALIAS_OK) {
 >>> +                               if (retval != PKT_ALIAS_OK &&
 >>> +                                   retval != 
 >> PKT_ALIAS_FOUND_HEADER_FRAGMENT) {
 >>>                                         /* XXX - should i 
 >> add some logging? */
 >>>                                         m_free(mcl);
 >>>                                 badnat:
 >> 
 >> This is not so simple to fix as LibAlias API requires caller 
 >> to save packet
 >> fragments somewhere and then at some time to feed them all 
 >> back. And kernel
 >> infrastructure currently is not so suitable for that packet storage.
 >  /sbin/natd doesn't use this method too. But it is in source tree and works.
 
 natd(8) relies on a divert(4) socket on doing reassembly, again in kernel - and
 ppp(8) actually use this method.
 
 >  This patch will work at most cases.
 >  It is better to work with a bad patch, than to not work absolutely.
 
 No, that's not FreeBSD way. Especially when you have workaround available.
   
 >> As a workaround you can currently send packets with some ipfw 
 >> rule before NAT
 >> to a divert socket on wich ng_ksocket listens and returns 
 >> packets back with
 >> ng_echo (thus packets won't leave kernel), as divert sockets do packet
 >> reassembly.
 >  So ng_ksocket has kernel memory for fragmented packet's buffer, but libalias
 >  not? :)
 
 Yes, because libalias(3) was developed more than 10 years for ppp(8), and was
 never ment to be ported to the kernel (it still has many-many quirks). Kernel
 sockets, and divert(4) as well, all use finite reassembly space for packets
 destined to this machine. This is not a problem with natd(8) as it is not so
 fast, but for more intensive solutions with in-kernel libalias a better
 solution should be found.
 
 -- 
 WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
 [Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]
Responsible-Changed-From-To: freebsd-ipfw->piso 
Responsible-Changed-By: piso 
Responsible-Changed-When: Tue Mar 18 14:49:49 UTC 2008 
Responsible-Changed-Why:  
Assign it to me. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=121743 
State-Changed-From-To: open->closed 
State-Changed-By: mav 
State-Changed-When: Sun Jun 1 12:38:23 UTC 2008 
State-Changed-Why:  
Patch has been committed to the HEAD. 
Same patch was applied to ng_nat times ago and it is proven to work. 
This solution is far from ideal as it does not handle the case when  
due to fragments reorder initial fragment goes not first. 
But it is surely much better then nothing. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=121743 
>Unformatted:
 >>> Our outbound ICMP echo request before NAT
 Mar 13 10:39:00 gw kernel: ipfw: 999 Count ICMP:8.0 192.168.1.1 10.0.0.1 out via xl0
 >>> Our outbound ICMP echo request after NAT
 Mar 13 10:39:00 gw kernel: ipfw: 1001 Count ICMP:8.0 10.255.255.2 10.0.0.1 out via xl0
 >>> ICMP echo reply (fragment 1) before NAT
 Mar 13 10:39:00 gw kernel: ipfw: 999 Count ICMP:0.0 10.0.0.1 10.255.255.2 in via xl0 (frag 20433:1480@0+)
 >>> ICMP echo reply (fragment 2) before NAT
 Mar 13 10:39:00 gw kernel: ipfw: 999 Count ICMP 10.0.0.1 10.255.255.2 in via xl0 (frag 20433:528@1480)
 >>> (!) ICMP echo reply (fragment 1) IS LOST!
 >>> ICMP echo reply (fragment 2) after NAT
 Mar 13 10:39:00 gw kernel: ipfw: 1001 Count ICMP 10.0.0.1 192.168.1.1 in via xl0 (frag 20433:528@1480)
 ..
 Mar 13 10:39:01 gw kernel: ipfw: 999 Count ICMP:8.0 192.168.1.1 10.0.0.1 out via xl0
 Mar 13 10:39:01 gw kernel: ipfw: 1001 Count ICMP:8.0 10.255.255.2 10.0.0.1 out via xl0
 Mar 13 10:39:01 gw kernel: ipfw: 999 Count ICMP:0.0 10.0.0.1 10.255.255.2 in via xl0 (frag 21967:1480@0+)
 Mar 13 10:39:01 gw kernel: ipfw: 999 Count ICMP 10.0.0.1 10.255.255.2 in via xl0 (frag 21967:528@1480)
 Mar 13 10:39:01 gw kernel: ipfw: 1001 Count ICMP 10.0.0.1 192.168.1.1 in via xl0 (frag 21967:528@1480)
 ..
 
