From nobody@FreeBSD.org  Sat Feb 25 14:24:43 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C334A16A420
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2006 14:24:43 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5503643D7F
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2006 14:24:25 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k1PEOPMS034304
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 25 Feb 2006 14:24:25 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k1PEOPnK034302;
	Sat, 25 Feb 2006 14:24:25 GMT
	(envelope-from nobody)
Message-Id: <200602251424.k1PEOPnK034302@www.freebsd.org>
Date: Sat, 25 Feb 2006 14:24:25 GMT
From: "C.Dornig" <c_dornig@gmx.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Pfsync state time problem with CARP + Arp.Balance
X-Send-Pr-Version: www-2.3

>Number:         93829
>Category:       kern
>Synopsis:       [carp] pfsync state time problem with CARP + Arp.Balance
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    glebius
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Feb 25 14:30:04 GMT 2006
>Closed-Date:    Thu Aug 10 10:17:01 GMT 2006
>Last-Modified:  Thu Aug 10 10:17:01 GMT 2006
>Originator:     C.Dornig
>Release:        6.0 Release
>Organization:
none
>Environment:
FreeBSD fw-cluster-1 6.0-RELEASE FreeBSD 6.0-RELEASE #4: Thu Feb 23 15:01:55 CET 2006     root@t-fw-cluster01:/usr/obj/usr/src/sys/CD-UNIX  i386

and

FreeBSD fw-cluster-2 6.0-RELEASE FreeBSD 6.0-RELEASE #4: Thu Feb 23 15:01:55 CET 2006     root@t-fw-cluster01:/usr/obj/usr/src/sys/CD-UNIX  i386

>Description:
HI,


I have a problem with CARP + pf + pfsync in arp.balance mode.
I have config 2 Cluster Routing / netfilter machines with carp + arpbalance.

The pf rule a the same on both server.  If the servers run in none
arp.balance mode the rules are all fine and working perfectly.

But, if i turn on arp.balance than i become follow problem.

I made a ping (icmp packet) from my client pc (Client-LAN) to the Server
behind the PF Cluster in other LAN.  The first packet goes through the
PFCluster1 and the back packet goes through 6luster2. But, the state
information from the first packet to the server is not fast enough on the
PFCluster2 machine and because the pf rules, the back packet will block.
The next packet from client to server will passed also the back traffic.

With out arp.balance the rule are ok, and all traffic will passed and the
states will write correct. Only routing without pf are all ok.

I have made all network diagnostics. I have made tcpdump on all interfaces
and the carps are all OK. Also pfsync packets will receive and send from
each machine. The two machine can send and receive packet each other.

I think there is a time probleme from the pfsync. I mean that pfsync send
too slow the state change to the other.
>How-To-Repeat:
To reproduce you must setup two machines with follow config:

you need two nic interfaces.

IP Range LAN1: 
1.1.0.0/16 on interface em1

IP Range LAN2:
10.1.127.101 on interface em0 - manage ip.
10.3.155.0/25 on interface vlan155 -> em0

PFsync:
15.1.1.0/24 pfsync on fxp0 with crossover cable to machine 2.

Carps:
1.1.10.50
and 
10.3.155.254

Gateway on side 1:
1.1.10.50
Gateway on side 2: 
10.3.155.254



ifconfig output from machine 1:

em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet6 fe80::240:d0ff:fe43:d986%em0 prefixlen 64 scopeid 0x1
        inet 10.1.127.101 netmask 0xffffff00 broadcast 10.1.127.255
        ether 00:40:d0:43:d9:86
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        inet6 fe80::250:8bff:fe66:9274%fxp0 prefixlen 64 scopeid 0x2
        inet 15.1.1.1 netmask 0xffffff00 broadcast 15.1.1.255
        ether 00:50:8b:66:92:74
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
fxp1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        ether 00:50:8b:66:92:75
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet6 fe80::240:d0ff:fe43:d987%em1 prefixlen 64 scopeid 0x4
        inet 1.1.10.101 netmask 0xffff0000 broadcast 1.1.255.255
        ether 00:40:d0:43:d9:87
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pfsync0: flags=41<UP,RUNNING> mtu 1348
        pfsync: syncdev: fxp0 syncpeer: 15.1.1.2 maxupd: 128
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
        inet 127.0.0.1 netmask 0xff000000
vlan155: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        inet 10.3.155.10 netmask 0xffffff00 broadcast 10.3.155.255
        inet6 fe80::240:d0ff:fe43:d986%vlan155 prefixlen 64 scopeid 0x8
        ether 00:40:d0:43:d9:86
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        vlan: 155 parent interface: em0
carp100: flags=41<UP,RUNNING> mtu 1500
        inet 1.1.10.50 netmask 0xffffff00
        carp: MASTER vhid 10 advbase 1 advskew 0
carp101: flags=41<UP,RUNNING> mtu 1500
        inet 1.1.10.50 netmask 0xffffff00
        carp: BACKUP vhid 11 advbase 1 advskew 100
carp1551: flags=41<UP,RUNNING> mtu 1500
        inet 10.3.155.254 netmask 0xffffff00
        carp: BACKUP vhid 155 advbase 1 advskew 100
carp1552: flags=41<UP,RUNNING> mtu 1500
        inet 10.3.155.254 netmask 0xffffff00
        carp: MASTER vhid 255 advbase 1 advskew 0

Ifconfig from machine 2:

em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet6 fe80::240:d0ff:fe43:d986%em0 prefixlen 64 scopeid 0x1
        inet 10.1.127.102 netmask 0xffffff00 broadcast 10.1.127.255
        ether 00:40:d0:43:d9:86
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        inet6 fe80::250:8bff:fe66:9274%fxp0 prefixlen 64 scopeid 0x2
        inet 15.1.1.2 netmask 0xffffff00 broadcast 15.1.1.255
        ether 00:50:8b:66:92:74
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
fxp1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        ether 00:50:8b:66:92:75
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        inet6 fe80::240:d0ff:fe43:d987%em1 prefixlen 64 scopeid 0x4
        inet 1.1.10.102 netmask 0xffff0000 broadcast 1.1.255.255
        ether 00:40:d0:43:d9:87
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pfsync0: flags=41<UP,RUNNING> mtu 1348
        pfsync: syncdev: fxp0 syncpeer: 15.1.1.1 maxupd: 128
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
        inet 127.0.0.1 netmask 0xff000000
vlan155: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        inet 10.3.155.11 netmask 0xffffff00 broadcast 10.3.155.255
        inet6 fe80::240:d0ff:fe43:d986%vlan155 prefixlen 64 scopeid 0x8
        ether 00:40:d0:43:d9:86
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        vlan: 155 parent interface: em0
carp100: flags=41<UP,RUNNING> mtu 1500
        inet 1.1.10.50 netmask 0xffffff00
        carp: BACKUP vhid 10 advbase 1 advskew 100
carp101: flags=41<UP,RUNNING> mtu 1500
        inet 1.1.10.50 netmask 0xffffff00
        carp: MASTER vhid 11 advbase 1 advskew 0
carp1551: flags=41<UP,RUNNING> mtu 1500
        inet 10.3.155.254 netmask 0xffffff00
        carp: MASTER vhid 155 advbase 1 advskew 0
carp1552: flags=41<UP,RUNNING> mtu 1500
        inet 10.3.155.254 netmask 0xffffff00
        carp: BACKUP vhid 255 advbase 1 advskew 100

Then you need a little pf.conf with is same on both machines:

table <MANAGE> { 10.1.127.101 , 10.1.127.102 }


block log-all all
pass quick on lo0 inet from 127.0.0.1 to 127.0.0.1 keep state

### Pfsync Rule
pass quick on { em1 } proto pfsync
### CARP Rule
pass quick proto carp keep state

pass out log-all on em1 inet from 10.3.155.0/24  to 1.1.0.0/16 keep state
pass in quick log-all on em1 inet proto tcp  from 1.1.0.0/16  to <MANAGE> port 22 keep state

pass in quick log-all on vlan155 inet  from 10.3.155.0/24  to any keep state
pass out quick log-all inet from any  to any keep state


Then you need 2 Test machines in LAN1 with IP:
1.1.XXX.YYY/16 and Gateway 1.1.10.50
Test Machines 2:
10.3.155.XXX/24 Gateway 10.3.155.254 -> with untagged vlanport.

And now you can test a ping from Test Machine to Test Machine.
Machine 1 must have the arp address from gateway 1
Machine 2 muss have the arp address from gateway 2
Only if the Machines has different MAC lists about her gateway, you can
reproduce my problem.



>Fix:
I think you must change the source code for more pfsync packets send
and receive.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-pf 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Feb 26 07:25:45 UTC 2006 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=93829 

From: Gleb Smirnoff <glebius@FreeBSD.org>
To: "C.Dornig" <c_dornig@gmx.de>
Cc: mlaier@FreeBSD.org, dhartmei@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/93829: Pfsync state time problem with CARP + Arp.Balance
Date: Sun, 26 Feb 2006 14:08:43 +0300

 On Sat, Feb 25, 2006 at 02:24:25PM +0000, C.Dornig wrote:
 C> I have a problem with CARP + pf + pfsync in arp.balance mode.
 C> I have config 2 Cluster Routing / netfilter machines with carp + arpbalance.
 C> 
 C> The pf rule a the same on both server.
 C> if the servers run in none arp.balance mode the rules are all fine and working perfektli.
 C> But, if i turn on arp.balance than i become follow problem.
 C> I made a ping (icmp packet) from my client pc (Client-LAN) to the Server behind the PF Cluster in other LAN.
 C> The first packet goes through the PFCluster1 and the back packet goes through 6luster2. But, the state information from the first packet to the server is not fast enough on the PFCluster2 machine and because the pf rules, the back packet will blocked. The next packet from client to server will passed also the back traffic.
 C> 
 C> With out arp.balance the rule are ok, and all traffic will passed and the states will write correct. Only routing without pf are all ok.
 C> 
 C> I have made all network diagnostics. I have made tcpdump on all interfaces and the carps are all OK. Also pfsync packets will receive and send from each machine. The two machine can send and receive packet each other.
 C> 
 C> I think there is a time probleme from the pfsync. I mean that pfsync send too slow the state change to the other.
 
 You have a race between three computers - both CARP routers, and the host
 behind them. The ICMP packet can reach the host and be replied faster,
 then the state information is sent from one CARP router to another. I think,
 this problem is not solvable at all, so we must state that ARP load balancing
 is not compatible with pfsync(4).
 
 
 -- 
 Totus tuus, Glebius.
 GLEBIUS-RIPN GLEB-RIPE

From: "Bill Marquette" <bill.marquette@gmail.com>
To: "Jon Simola" <jon@abccomm.com>
Cc: freebsd-pf@freebsd.org, bug-followup@FreeBSD.org
Subject: Re: kern/93829: [carp] pfsync state time problem with CARP + Arp.Balance
Date: Sun, 26 Feb 2006 10:02:34 -0600

 On 2/26/06, Jon Simola <jon@abccomm.com> wrote:
 > On 2/25/06, Mark Linimon <linimon@freebsd.org> wrote:
 >
 > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D93829
 >
 > > pfsync0: flags=3D41<UP,RUNNING> mtu 1348
 > >        pfsync: syncdev: fxp0 syncpeer: 15.1.1.1 maxupd: 128
 >
 > > ### Pfsync Rule
 > > pass quick on { em1 } proto pfsync
 >
 > This problem seems obvious.
 
 Yep, looks like user error in this case.  However, I've seen this
 happen when I've accidentally had carp mismatches such that my
 firewalls were also seeing an asymmetric traffic stream.  The hazard
 of fast networks (and possibly slow machines) I'm afraid.
 
 --Bill

From: Max Laier <max@love2party.net>
To: bug-followup@freebsd.org,
 c_dornig@gmx.de
Cc:  
Subject: Re: kern/93829: [carp] pfsync state time problem with CARP + Arp.Balance
Date: Fri, 2 Jun 2006 09:45:20 +0200

 Spring cleaning: Can this be closed now?  Should we chown to doc?  I'm not 
 sure where or how to document it, though.
 
 --
  Max
State-Changed-From-To: open->patched 
State-Changed-By: glebius 
State-Changed-When: Wed Jun 7 10:28:13 UTC 2006 
State-Changed-Why:  
I have documented why this setup can't work in carp(4). 


Responsible-Changed-From-To: freebsd-pf->glebius 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Wed Jun 7 10:28:13 UTC 2006 
Responsible-Changed-Why:  
I have documented why this setup can't work in carp(4). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=93829 
State-Changed-From-To: patched->closed 
State-Changed-By: glebius 
State-Changed-When: Thu Aug 10 10:16:03 UTC 2006 
State-Changed-Why:  
Documentation changes merged to RELENG_6. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=93829 
>Unformatted:
