From michaelb@demon.net  Mon Oct 29 04:40:30 2001
Return-Path: <michaelb@demon.net>
Received: from internal.mail.demon.net (internal.mail.demon.net [193.195.224.3])
	by hub.freebsd.org (Postfix) with ESMTP id 157F637B403
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 29 Oct 2001 04:40:29 -0800 (PST)
Received: from singsing.eng.demon.net (singsing.eng.demon.net [194.217.90.11])
	by internal.mail.demon.net with ESMTP id f9TCeRi06510;
	Mon, 29 Oct 2001 12:40:28 GMT
Received: from michaelb by singsing.eng.demon.net with local (Exim 3.33 #1)
	id 15yBiN-0000cK-00
	for FreeBSD-gnats-submit@freebsd.org; Mon, 29 Oct 2001 12:40:27 +0000
Message-Id: <E15yBiN-0000cK-00@singsing.eng.demon.net>
Date: Mon, 29 Oct 2001 12:40:27 +0000
From: Mike Bristow <michaelb@thus.net>
Reply-To: Mike Bristow <michaelb@thus.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: netgraph bridges cause connectivity probs from bridge
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         31586
>Category:       kern
>Synopsis:       netgraph bridges cause connectivity probs from bridge
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    archie
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 29 04:50:03 PST 2001
>Closed-Date:    Tue Feb 12 10:21:31 PST 2002
>Last-Modified:  Tue Feb 12 10:22:08 PST 2002
>Originator:     Mike Bristow
>Release:        FreeBSD 4.4-STABLE-20011027T210032 i386
>Organization:
THUS PLC
>Environment:
System: FreeBSD singsing 4.4-STABLE-20011027T210032 FreeBSD 4.4-STABLE-20011027T210032 #0: Sat Oct 27 22:33:54 BST 2001 michaelb@singsing:/usr/obj/usr/src/sys/SINGSING-ENG i386

>Description:

Using a suitibly modified copy of
/usr/share/examples/netgraph/ether.bridge, I brought up a bridge
between de0, de1, de2, and xl0 (with xl0 being the "local interface".

The topology is :

   [ singsing ] <-x de0 -> [ win2k box]
                <-x de1 -> [ solaris box ]
                <-x de2 -> [ utterly borked freebsd-current box ]
	        <-- xl0 -> [ summit 48 + rest of network, default route etc ]

where <-x interface -> indicates a crossover cat5 cable, and <--
interface -> indicates a normal bit of cat5

[ this is all done because of a lack of wall ports in the office ]

Machines attached to the de0, de1 or de2 have connectivity problems
from singsing, but they are reachable from other machines on the network.

A "snoop -o snoop.out host singsing and host solaris_box" run on
the solaris box while simultanously running "telnet singsing imap"
in another windown suggest that the TCP checksum is wrong (acording to 
ethereal).  I haven't "snooped" icmp echo requests/responses in
either direction yet; nor did I run tcpdump on "singsing".

Traffic out of the xl0 interface appears to be unaffected.

http://www.urgle.com/~mike/netgraph/ has my modified ether.bridge;
the snoop output as seen by the solaris box; and dmesg output from "singsing".

This problem has appeared between 4.4-STABLE-20011001T111411 &
4.4-STABLE-20011027T210032

(where the timestamp is the time "cvs update" was run against a local
CVS mirror that cvsups hourly)

If additional snoops/tcpdumps on assorted hosts would be useful,
or if anything else would help track this down, let me know.

>How-To-Repeat:

	Set up a bridge using /usr/share/examples/netgraph/ether.bridge
and a LOCAL_IFACE; try and connect from the bridge to machines that
are attached to BRIDGE_IFACES but not LOCAL_IFACE

>Fix:

	An evil workaround is:
	for i in machines attached to de{0,1,2} ; do 
		route add $i $default_route
	done

but this is obviously less than ideal.
>Release-Note:
>Audit-Trail:

From: Archie Cobbs <archie@packetdesign.com>
To: freebsd-gnats-submit@FreeBSD.org, michaelb@thus.net
Cc:  
Subject: Re: kern/31586: netgraph bridges cause connectivity probs from bridge
Date: Sat, 02 Feb 2002 12:47:36 -0800

 I think the problem is the delayed checksum support added to FreeBSD.
 The delayed checksum code makes the (reasonable) assumption in
 ip_output() that a packet routed to a specific interface will actually
 be transmitted by the hardware associated with that interface. This
 is not necessarily true when netgraph is being used.
 
 The solution would be to modify ng_ether() to compute on-demand any
 delayed checksums for packets leaving the 'lower' hook when that hook
 is attached to another node. I'll try to work on this soon.
 
 -Archie
 
 __________________________________________________________________________
 Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com

From: Archie Cobbs <archie@dellroad.org>
To: freebsd-gnats-submit@FreeBSD.org, michaelb@thus.net
Cc:  
Subject: Re: kern/31586: netgraph bridges cause connectivity probs from bridge
Date: Sun, 3 Feb 2002 10:46:56 -0800 (PST)

 Here is a candidate patch. There seem to be two strategies:
 
 #1 Check the flags for mbufs going out the 'lower' hook and
    compute the checksums for packets that need it.
 
 #2 Temporarily turn off the interface flags advertising hardware
    checksums whenever 'lower' is connected.
 
 The patch below implements #2. While #1 might make more sense,
 the problem is that it fails in the case of CSUM_FRAGMENT.
 To handle CSUM_FRAGMENT in case #1, we would have to implement
 IP fragmentation code in ng_ether, yuck... seems much better
 to let ip_output() do it.
 
 The only unknown is whether the drivers will freak out when
 ifp->if_hwassist is changed to zero and back... it doesn't appear
 so; it seems the worst thing that would happen is the chip
 recomputing a checksum, but I'm not positive about that.
 
 -Archie
 
 __________________________________________________________________________
 Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com
 
 Index: sys/netgraph/ng_ether.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/netgraph/ng_ether.c,v
 retrieving revision 1.2.2.10
 diff -u -r1.2.2.10 ng_ether.c
 --- sys/netgraph/ng_ether.c	1 Sep 2001 08:22:29 -0000	1.2.2.10
 +++ sys/netgraph/ng_ether.c	2 Feb 2002 21:46:39 -0000
 @@ -75,6 +75,7 @@
  	u_char		lowerOrphan;	/* whether lower is lower or orphan */
  	u_char		autoSrcAddr;	/* always overwrite source address */
  	u_char		promisc;	/* promiscuous mode enabled */
 +	u_long		hwassist;	/* hardware checksum capabilities */
  };
  typedef struct private *priv_p;
  
 @@ -319,6 +320,7 @@
  	priv->ifp = ifp;
  	IFP2NG(ifp) = node;
  	priv->autoSrcAddr = 1;
 +	priv->hwassist = ifp->if_hwassist;
  
  	/* Try to give the node the same name as the interface */
  	if (ng_name_node(node, name) != 0) {
 @@ -470,6 +472,10 @@
  	if (*hookptr != NULL)
  		return (EISCONN);
  
 +	/* If lower hook connected, disable hardware checksum */
 +	if (hookptr == &priv->lower)
 +		ifp->if_hwassist = 0;
 +
  	/* OK */
  	*hookptr = hook;
  	priv->lowerOrphan = orphan;
 @@ -685,6 +691,7 @@
  {
  	const priv_p priv = hook->node->private;
  
 +	/* Mark hook not connected */
  	if (hook == priv->upper)
  		priv->upper = NULL;
  	else if (hook == priv->lower) {
 @@ -692,8 +699,14 @@
  		priv->lowerOrphan = 0;
  	} else
  		panic("%s: weird hook", __FUNCTION__);
 +
 +	/* If lower hook disconnected, restore hardware checksum */
 +	if (hookptr == &priv->lower)
 +		ifp->if_hwassist = priv->hwassist;
 +
 +	/* Reset node after last disconnect */
  	if (hook->node->numhooks == 0)
 -		ng_rmnode(hook->node);	/* reset node */
 +		ng_rmnode(hook->node);
  	return (0);
  }
  
Responsible-Changed-From-To: freebsd-bugs->archie 
Responsible-Changed-By: archie 
Responsible-Changed-When: Sun Feb 3 21:20:36 PST 2002 
Responsible-Changed-Why:  
My bug to fix. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=31586 

From: Archie Cobbs <archie@packetdesign.com>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/31586: netgraph bridges cause connectivity probs from bridge
Date: Mon, 4 Feb 2002 14:53:28 -0800 (PST)

 OK, below are the 'better' patches for -current and -stable.
 
 The previous patch was broken because it's the 'upper' hook
 we need to watch, not the 'lower' hook.
 
 -Archie
 
 __________________________________________________________________________
 Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com
 
 ============================ current patch ===============================
 
 --- sys/netgraph/ng_ether.c	10 Dec 2001 08:09:47 -0000	1.21
 +++ sys/netgraph/ng_ether.c	3 Feb 2002 19:00:48 -0000
 @@ -75,6 +75,7 @@
  	u_char		lowerOrphan;	/* whether lower is lower or orphan */
  	u_char		autoSrcAddr;	/* always overwrite source address */
  	u_char		promisc;	/* promiscuous mode enabled */
 +	u_long		hwassist;	/* hardware checksum capabilities */
  	u_int		flags;		/* flags e.g. really die */
  };
  typedef struct private *priv_p;
 @@ -317,6 +318,7 @@
  	priv->ifp = ifp;
  	IFP2NG(ifp) = node;
  	priv->autoSrcAddr = 1;
 +	priv->hwassist = ifp->if_hwassist;
  
  	/* Try to give the node the same name as the interface */
  	if (ng_name_node(node, name) != 0) {
 @@ -468,6 +470,10 @@
  	if (*hookptr != NULL)
  		return (EISCONN);
  
 +	/* Disable hardware checksums while 'upper' hook is connected */
 +	if (hookptr == &priv->upper)
 +		priv->ifp->if_hwassist = 0;
 +
  	/* OK */
  	*hookptr = hook;
  	priv->lowerOrphan = orphan;
 @@ -715,9 +721,10 @@
  {
  	const priv_p priv = NG_NODE_PRIVATE(NG_HOOK_NODE(hook));
  
 -	if (hook == priv->upper)
 +	if (hook == priv->upper) {
  		priv->upper = NULL;
 -	else if (hook == priv->lower) {
 +		priv->ifp->if_hwassist = priv->hwassist;  /* restore h/w csum */
 +	} else if (hook == priv->lower) {
  		priv->lower = NULL;
  		priv->lowerOrphan = 0;
  	} else
 
 ============================ stable patch ===============================
 
 --- sys/netgraph/ng_ether.c	1 Sep 2001 08:22:29 -0000	1.2.2.10
 +++ sys/netgraph/ng_ether.c	3 Feb 2002 19:00:07 -0000
 @@ -75,6 +75,7 @@
  	u_char		lowerOrphan;	/* whether lower is lower or orphan */
  	u_char		autoSrcAddr;	/* always overwrite source address */
  	u_char		promisc;	/* promiscuous mode enabled */
 +	u_long		hwassist;	/* hardware checksum capabilities */
  };
  typedef struct private *priv_p;
  
 @@ -319,6 +320,7 @@
  	priv->ifp = ifp;
  	IFP2NG(ifp) = node;
  	priv->autoSrcAddr = 1;
 +	priv->hwassist = ifp->if_hwassist;
  
  	/* Try to give the node the same name as the interface */
  	if (ng_name_node(node, name) != 0) {
 @@ -470,6 +472,10 @@
  	if (*hookptr != NULL)
  		return (EISCONN);
  
 +	/* Disable hardware checksums while 'upper' hook is connected */
 +	if (hookptr == &priv->upper)
 +		priv->ifp->if_hwassist = 0;
 +
  	/* OK */
  	*hookptr = hook;
  	priv->lowerOrphan = orphan;
 @@ -685,9 +691,10 @@
  {
  	const priv_p priv = hook->node->private;
  
 -	if (hook == priv->upper)
 +	if (hook == priv->upper) {
  		priv->upper = NULL;
 -	else if (hook == priv->lower) {
 +		priv->ifp->if_hwassist = priv->hwassist;  /* restore h/w csum */
 +	} else if (hook == priv->lower) {
  		priv->lower = NULL;
  		priv->lowerOrphan = 0;
  	} else

From: Mike Bristow <michaelb@demon.net>
To: Archie Cobbs <archie@dellroad.org>,
	<freebsd-gnats-submit@freebsd.org>, <michaelb@thus.net>
Cc:  
Subject: Re: kern/31586: netgraph bridges cause connectivity probs from
	bridge
Date: Tue, 05 Feb 2002 12:17:10 +0000

 On 3/2/02 6:46 pm, "Archie Cobbs" <archie@dellroad.org> wrote:
 
 > Here is a candidate patch.
 
 The problem seems to be resolved when I apply the -stable version of the
 improved patch you mailed me.  I'll leave the box running with the patch
 applied and let you know if there are any problems.
 
 (This mail is being sent from my laptop which treats the bridge as the IMAP
 server and SMTP smarthost)
 
 -- 
 Mike Bristow
 
 Sent using the Entourage X Test Drive.
 
State-Changed-From-To: open->closed 
State-Changed-By: archie 
State-Changed-When: Tue Feb 12 10:21:31 PST 2002 
State-Changed-Why:  
Patches applied to -current and -stable. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=31586 
>Unformatted:
