From delphij@tarsier.delphij.net  Mon Apr 10 01:49:39 2006
Return-Path: <delphij@tarsier.delphij.net>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id ABF3016A403
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Apr 2006 01:49:39 +0000 (UTC)
	(envelope-from delphij@tarsier.delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [210.51.165.229])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 76E9343D53
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Apr 2006 01:49:30 +0000 (GMT)
	(envelope-from delphij@tarsier.delphij.net)
Received: from localhost (tarsier.geekcn.org [210.51.165.229])
	by tarsier.geekcn.org (Postfix) with ESMTP id 56594EB2A49
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Apr 2006 09:49:28 +0800 (CST)
Received: from tarsier.geekcn.org ([210.51.165.229])
 by localhost (mail.geekcn.org [210.51.165.229]) (amavisd-new, port 10024)
 with ESMTP id 72478-03 for <FreeBSD-gnats-submit@freebsd.org>;
 Mon, 10 Apr 2006 09:49:25 +0800 (CST)
Received: from tarsier.delphij.net (tarsier.geekcn.org [210.51.165.229])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTP id C9007EB2327
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Apr 2006 09:49:24 +0800 (CST)
Received: from tarsier.delphij.net (localhost [127.0.0.1])
	by tarsier.delphij.net (8.13.6/8.13.4) with ESMTP id k3A1nO6W074309
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Apr 2006 09:49:24 +0800 (CST)
	(envelope-from delphij@tarsier.delphij.net)
Received: (from delphij@localhost)
	by tarsier.delphij.net (8.13.6/8.13.4/Submit) id k3A1nI1Y074308;
	Mon, 10 Apr 2006 09:49:18 +0800 (CST)
	(envelope-from delphij)
Message-Id: <200604100149.k3A1nI1Y074308@tarsier.delphij.net>
Date: Mon, 10 Apr 2006 09:49:18 +0800 (CST)
From: Xin LI <delphij@freebsd.org>
Reply-To: Xin LI <delphij@freebsd.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [RELENG_6] write(2) fails with EPERM on TCP socket under certain situations
X-Send-Pr-Version: 3.113
X-GNATS-Notify: gnn@FreeBSD.org, rwatson@FreeBSD.org, mlaier@FreeBSD.org

>Number:         95559
>Category:       kern
>Synopsis:       RELENG_6: write(2) fails with EPERM on TCP socket under certain situations
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 10 01:50:14 GMT 2006
>Closed-Date:    
>Last-Modified:  Wed Apr 19 11:40:12 GMT 2006
>Originator:     Xin LI
>Release:        FreeBSD 6.1-RC i386
>Organization:
The FreeBSD Project
>Environment:
System: FreeBSD tarsier.delphij.net 6.1-RC FreeBSD 6.1-RC #26: Sun Apr 9 04:27:53 CST 2006 delphij@tarsier.delphij.net:/usr/obj/usr/src/sys/TARSIER i386

>Description:
	With two rule set in pf.conf, connection from cvsup client
within a jail to the cvsupd running in the host would fail, which
ends up with that cvsupd (in the host) died with write(2) on the
TCP socket, which suddenly returns EPERM.

	The box has pf(4) and ipfw(4) installed where, pf(4) was
loaded with two rules, while ipfw(4) has an empty ruleset with
a default accept rule.
>How-To-Repeat:

	First, one should load the following ruleset onto pf(4)

--- pf.conf begins here ---
scrub reassemble tcp random-id
set skip on lo0
--- pf.conf ends here ---

	Second, run a cvsupd daemon from the host.

	Third, set up a jail and try to transfer some big data
from the host.

	A ktrace dump is available at:
		http://www.delphij.net/kdump.txt.bz2

	Please note that the dump is big (about 7MB).

>Fix:

	By removing either rule from the pf.conf seems to work
around the issue.  However, we have grep'ed EPERM from netinet
and pf code and found that there is not a reasonable reason
why write(2) would return EPERM in the code path.
>Release-Note:
>Audit-Trail:

From: Max Laier <max@love2party.net>
To: bug-followup@freebsd.org,
 delphij@freebsd.org
Cc:  
Subject: Re: kern/95559: RELENG_6: write(2) fails with EPERM on TCP socket under certain situations
Date: Wed, 12 Apr 2006 03:22:37 +0200

 Can you please try to get rid of IPFW completely?  IPFW does return EPERM (aka 
 IP_FW_DENY) and this might mean that tcp-reassembly just breaks a IPFW sanity 
 check that is performed eventhough the ruleset accepts all.
 
 -- 
   Max

From: Xin LI <delphij@delphij.net>
To: Max Laier <max@love2party.net>
Cc: bug-followup@freebsd.org, delphij@freebsd.org
Subject: Re: kern/95559: RELENG_6: write(2) fails with EPERM on TCP socket
	under certain situations
Date: Wed, 12 Apr 2006 10:56:58 +0800

 --=-k7PJ7o9Ke1pEhowR+HFe
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 Hi, Max,
 
 =E5=9C=A8 2006-04-12=E4=B8=89=E7=9A=84 03:22 +0200=EF=BC=8CMax Laier=E5=86=
 =99=E9=81=93=EF=BC=9A
 > Can you please try to get rid of IPFW completely?  IPFW does return EPERM=
  (aka=20
 > IP_FW_DENY) and this might mean that tcp-reassembly just breaks a IPFW sa=
 nity=20
 > check that is performed eventhough the ruleset accepts all.
 
 Thanks for the hints.
 
 Unfortunately, however, it appears that this does not help :-(  Having
 the two rules still causes EPERM.
 
 Cheers,
 --=20
 Xin LI <delphij delphij net>    http://www.delphij.net/
 
 --=-k7PJ7o9Ke1pEhowR+HFe
 Content-Type: application/pgp-signature; name=signature.asc
 Content-Description:
 	=?UTF-8?Q?=E8=BF=99=E6=98=AF=E4=BF=A1=E4=BB=B6=E7=9A=84=E6=95=B0?=
 	=?UTF-8?Q?=E5=AD=97=E7=AD=BE=E5=90=8D=E9=83=A8=E5=88=86?=
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.2.1 (FreeBSD)
 
 iD8DBQBEPGx6hcUczkLqiksRArqBAKCsnCNNZaXG9DeNKJiGd5BPkpVPmQCgrwee
 Nv9E3B1WZJqKDKz8WqH0BkI=
 =MW66
 -----END PGP SIGNATURE-----
 
 --=-k7PJ7o9Ke1pEhowR+HFe--
 

From: Gleb Smirnoff <glebius@FreeBSD.org>
To: Xin LI <delphij@FreeBSD.org>
Cc: dhartmei@FreeBSD.org, FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/95559: [RELENG_6] write(2) fails with EPERM on TCP socket under certain situations
Date: Wed, 19 Apr 2006 14:38:35 +0400

   Hi, Xin!
 
 On Mon, Apr 10, 2006 at 09:49:18AM +0800, Xin LI wrote:
 X> >How-To-Repeat:
 X> 
 X> 	First, one should load the following ruleset onto pf(4)
 X> 
 X> --- pf.conf begins here ---
 X> scrub reassemble tcp random-id
 X> set skip on lo0
 X> --- pf.conf ends here ---
 X> 
 X> 	Second, run a cvsupd daemon from the host.
 X> 
 X> 	Third, set up a jail and try to transfer some big data
 X> from the host.
 X> 
 X> 	A ktrace dump is available at:
 X> 		http://www.delphij.net/kdump.txt.bz2
 X> 
 X> 	Please note that the dump is big (about 7MB).
 X> 
 X> >Fix:
 X> 
 X> 	By removing either rule from the pf.conf seems to work
 X> around the issue.  However, we have grep'ed EPERM from netinet
 X> and pf code and found that there is not a reasonable reason
 X> why write(2) would return EPERM in the code path.
 
 I think this behavior is correct. The traffic from host to jail
 is routed through lo0, however within a jail the hosts address
 is a foreign one, and thus is routed via some interface, not lo0.
 
 So traffic from host to jail runs through lo0 and traffic from
 jail to host doesn't.
 
 With the above rules you establish TCP scurbbing in pf, which
 requires inspecting and normalizing TCP packets in both
 directions. However, you skip pf processing for one direction,
 and pf sees only half of TCP connection and assumes connection
 bogus and thus denies it.
 
 P.S. May be Daniel can comment on this.
 
 -- 
 Totus tuus, Glebius.
 GLEBIUS-RIPN GLEB-RIPE

From: Xin LI <delphij@delphij.net>
To: Gleb Smirnoff <glebius@FreeBSD.org>, gnn@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org>, mlaier@FreeBSD.org
Cc: Xin LI <delphij@FreeBSD.org>, dhartmei@FreeBSD.org,  FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/95559: [RELENG_6] write(2) fails with EPERM on TCP socket
	under certain situations
Date: Wed, 19 Apr 2006 18:48:39 +0800

 --=-+RZxZOiXMpDlIO44tzHy
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 Hi, Gleb!
 
 =E5=9C=A8 2006-04-19=E4=B8=89=E7=9A=84 14:38 +0400=EF=BC=8CGleb Smirnoff=E5=
 =86=99=E9=81=93=EF=BC=9A
 > X> 	By removing either rule from the pf.conf seems to work
 > X> around the issue.  However, we have grep'ed EPERM from netinet
 > X> and pf code and found that there is not a reasonable reason
 > X> why write(2) would return EPERM in the code path.
 >=20
 > I think this behavior is correct. The traffic from host to jail
 > is routed through lo0, however within a jail the hosts address
 > is a foreign one, and thus is routed via some interface, not lo0.
 >=20
 > So traffic from host to jail runs through lo0 and traffic from
 > jail to host doesn't.
 >=20
 > With the above rules you establish TCP scurbbing in pf, which
 > requires inspecting and normalizing TCP packets in both
 > directions. However, you skip pf processing for one direction,
 > and pf sees only half of TCP connection and assumes connection
 > bogus and thus denies it.
 
 The strange thing is that the TCP connection (in ESTABLISHED state)'s
 socket will return EPERM after a good bunch of successful write() calls.
 Will pf happen to see only half of the TCP connection if it is in
 ESTABLISHED state?
 
 Cheers,
 --=20
 Xin LI <delphij delphij net>    http://www.delphij.net/
 
 --=-+RZxZOiXMpDlIO44tzHy
 Content-Type: application/pgp-signature; name=signature.asc
 Content-Description:
 	=?UTF-8?Q?=E8=BF=99=E6=98=AF=E4=BF=A1=E4=BB=B6=E7=9A=84=E6=95=B0?=
 	=?UTF-8?Q?=E5=AD=97=E7=AD=BE=E5=90=8D=E9=83=A8=E5=88=86?=
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.3 (FreeBSD)
 
 iD8DBQBERhWHhcUczkLqiksRAsNEAJ9DNdOWZ4kJBiKGk0TlCA0NeiPQHwCaAqGp
 tJrbWOUkNHJp9iUCd9uzkD4=
 =5mMH
 -----END PGP SIGNATURE-----
 
 --=-+RZxZOiXMpDlIO44tzHy--
 

From: Daniel Hartmeier <daniel@benzedrine.cx>
To: Xin LI <delphij@delphij.net>
Cc: Gleb Smirnoff <glebius@FreeBSD.org>, gnn@FreeBSD.org,
        Robert Watson <rwatson@FreeBSD.org>, mlaier@FreeBSD.org,
        Xin LI <delphij@FreeBSD.org>, FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/95559: [RELENG_6] write(2) fails with EPERM on TCP socket under certain situations
Date: Wed, 19 Apr 2006 13:37:52 +0200

 I haven't read all context yet, but maybe I can tell you how you can
 check whether it's really pf blocking any packets.
 
 If you create a state entry in pf for a TCP connection, pf must see and
 match all packets of that connection against that state entry, otherwise
 things will break. For instance, if pf only associates packets flowing
 in one direction with the state entry, the state entry will never
 advance to 'established' and pf can't track TCP windows and will sooner
 or later start to block packets.
 
 If outgoing packets of one connection are seen (by pf) on interface A,
 but incoming packets of the same connection on a different interface B,
 things still work, if you create a floating state (not using
 'if-bound'). But the direction of packets matters. If pf sees packets
 flowing in either direction both as incoming (on different interfaces),
 or both as outgoing, things break.
 
 To check whether either of those things occur in your setup, you can try
 to establish one connection, then check the following things in pf
 
   a) pfctl -vvss, should show one (or more) states related to the
      connection.
 
      Check the "x:y pkts" part on the third line, it shows how many
      packets pf has associated with the state entry so far. x is
      number of packets in the same direction as the initial packet
      that created the state, y is in the reverse direction. If either
      one of those is >1 but the other ==0, pf doesn't see replies in
      the opposite direction.
 
      The right-most string on the first line tells how advanced the
      state entry is. After a successful TCP handshake, while the
      connection is not closed from either side, it should read
      'ESTABLISHED:ESTABLISHED'.
 
      If there are multiple states related to a single connection,
      make sure each one is created as expected, and advancing normally.
 
      A common mistake, for instance, is to create state not on the
      initial SYN of the TCP handshake, but on a subsequent packet.
      This causes pf to miss the TCP window scaling negotiation, and
      can break connections eventually, after they appear to have been
      established fine and progress to some degree.
 
   b) pfctl -si, check for increasing counters once the problem occurs.
 
      pf will increase at least one of these counters for every packet
      it blocks, for any reason. If no counter is increasing, pf hasn't
      blocked a packet.
 
   c) pfctl -xm, enables debug logging to /var/log/messages, enable, then
      reproduce the problem, then check the log. If there are any
      messages from pf (like 'BAD state'), those will help analysis.
 
 One explanation why you'd see EPERM is that in FreeBSD, the pfil wrapper
 simply returns pf_test()'s return value. This is either PF_PASS (0) or
 PF_DROP (1), and 1 is also the value of EPERM, by coincidence.
 
 On OpenBSD and NetBSD, the return value PF_DROP of pf_test() is mapped
 to errno 65 EHOSTUNREACH, as that is one existing errno that most
 network related syscalls that can now additionally fail due to pf blocking
 can return otherwise already (according to their individual man pages).
 
 While returning EPERM is somewhat intuitive, it's not an errno an
 application must expect to come back from most such syscalls. On the
 other hand, people are regularly confused (on Open/Net) when tools (like
 ping) fail with 'No route to host' due to pf blocking, when the routing
 table is not the problem at all.
 
 Not sure if this is different on FreeBSD intentionally or an oversight.
 From a cross-platform supporter's point of view, it would make things
 easier if it was the same on all platforms :)
 
 HTH,
 Daniel
>Unformatted:
