From andre.albsmeier@mchp.siemens.de  Tue Jun  5 09:30:17 2001
Return-Path: <andre.albsmeier@mchp.siemens.de>
Received: from goliath.siemens.de (goliath.siemens.de [194.138.37.131])
	by hub.freebsd.org (Postfix) with ESMTP id 7F72B37B406
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  5 Jun 2001 09:30:16 -0700 (PDT)
	(envelope-from andre.albsmeier@mchp.siemens.de)
Received: from mail3.siemens.de (mail3.siemens.de [139.25.208.14])
	by goliath.siemens.de (8.11.1/8.11.1) with ESMTP id f55GUE409146
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 5 Jun 2001 18:30:14 +0200 (MET DST)
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.42.7])
	by mail3.siemens.de (8.11.1/8.11.1) with ESMTP id f55GUEV10711203
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 5 Jun 2001 18:30:14 +0200 (MEST)
Received: (from localhost)
	by curry.mchp.siemens.de (8.11.3/8.11.3) id f55GUEu36178
	for FreeBSD-gnats-submit@freebsd.org; Tue, 5 Jun 2001 18:30:14 +0200 (CEST)
Message-Id: <200106051630.f55GUEt76616@curry.mchp.siemens.de>
Date: Tue, 5 Jun 2001 18:30:14 +0200 (CEST)
From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: FreeBSD not always seems to take the best route
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         27890
>Category:       kern
>Synopsis:       FreeBSD not always seems to take the best route
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jun 05 09:40:00 PDT 2001
>Closed-Date:    Wed Jun 6 07:27:57 PDT 2001
>Last-Modified:  Wed Jun  6 08:00:01 PDT 2001
>Originator:     Andre Albsmeier
>Release:        FreeBSD 4.3-STABLE i386
>Organization:
>Environment:

System: 4.3-STABLE #67: Fri Jun 1 12:49:52 CEST 2001

>Description:

I have observed this behaviour for a long time now but finally had
time to dig into it... I reference syslogd as an example here
but I think the problem lies in the network code of the kernel...


Simple network:
 - two routers (1 and 2)
 - host C with IP 192.168.1.3
 - host S with IP 192.168.2.1

All machines are FreeBSD 4.3-STABLE.

Router 1 routes pkts between the Internet and 192.168.1.0 
Router 2 routes pkts between 192.168.1.0 and 192.168.2.0


           +-----+                 +-----+
  default  |     |   192.168.1.0   |     |   192.168.2.0
-----------|  1  |--------+--------|  2  |--------+-------- more hosts
           |     |        |        |     |        |
           +-----+        |        +-----+        |
                          |                       |
                       +-----+                 +-----+
                       |     |                 |     |
           192.168.1.3 |  C  |                 |  S  | 192.168.2.1
                       |     |                 |     |
                       +-----+                 +-----+


Relevant parts of netstat -rn on C during normal operation:
-------------------------------------------------------------
Destination        Gateway            Flags     Netif Expire
default            192.168.1.1        UGSc      fxp0
127.0.0.1          127.0.0.1          UH        lo0
192.168.1          link#1             UC        fxp0 =>
192.168.1.1        0:e0:18:90:91:bb   UHLW      fxp0   1182
192.168.1.2        0:e0:18:90:94:c8   UHLW      fxp0   1058
192.168.1.3        0:e0:18:90:45:dc   UHLW      lo0
192.168.1.255      ff:ff:ff:ff:ff:ff  UHLWb     fxp0
192.168.2          192.168.1.2        UGc       fxp0


The syslogd on host C is configured to log messages
to syslogd running on host S. This works perfectly,
all messages appear on host S.

Now we delete the route to net 192.168.2.0 on host C (this
can appear automatically if router 2 and/or its routed go
down for a while). If syslogd now wants to send a message
to S, the kernel uses the default route which is obvious
because the route to net 192.168.2.0 is gone. We can see the
packets go into router 1. I consider this as the correct
behaviour as well.

Now we bring back the route to net 192.168.2.0 again on host
C exactly as it was before (e.g. by restarting router 2 and/or
its routed). We can verify this with netstat -rn on C. We can
also ping host S or telnet to it or do other stuff which all
work perfectly.

The problem is that each time when syslogd on C wants to send
a packet to S, the kernel still uses 1 as router even though
it should send them through 2.  After HUPing or restarting
syslogd on C (which means that the UDP socket is closed and
opened again) things are back to normal.

It seems that as long as packets can be send somewhere, the
kernel doesn't bother if there is a better route to the
destination until the socket is closed and opened again.

>How-To-Repeat:

See above.

>Fix:

Unknown. I am happy to test suggestions, of course.
>Release-Note:
>Audit-Trail:

From: David Malone <dwmalone@maths.tcd.ie>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Tue, 5 Jun 2001 20:05:57 +0100

 On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
 > The problem is that each time when syslogd on C wants to send
 > a packet to S, the kernel still uses 1 as router even though
 > it should send them through 2.  After HUPing or restarting
 > syslogd on C (which means that the UDP socket is closed and
 > opened again) things are back to normal.
 
 This sounds like it is to do with the caching of recently used
 routes. Does the effect go away if you leave it for a while?
 Adjusting some of the following sysctls might change how long
 you have to wait:
 
 net.inet.ip.rtexpire
 net.inet.ip.rtminexpire
 net.inet.ip.rtmaxcache
 
 (Some of the networking people could definitely provide more
 details.)
 
 	David.

From: Ruslan Ermilov <ru@FreeBSD.org>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 11:24:19 +0300

 On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
 > 
 > I have observed this behaviour for a long time now but finally had
 > time to dig into it... I reference syslogd as an example here
 > but I think the problem lies in the network code of the kernel...
 > 
 > 
 > Simple network:
 >  - two routers (1 and 2)
 >  - host C with IP 192.168.1.3
 >  - host S with IP 192.168.2.1
 > 
 > All machines are FreeBSD 4.3-STABLE.
 > 
 > Router 1 routes pkts between the Internet and 192.168.1.0 
 > Router 2 routes pkts between 192.168.1.0 and 192.168.2.0
 > 
 > 
 >            +-----+                 +-----+
 >   default  |     |   192.168.1.0   |     |   192.168.2.0
 > -----------|  1  |--------+--------|  2  |--------+-------- more hosts
 >            |     |        |        |     |        |
 >            +-----+        |        +-----+        |
 >                           |                       |
 >                        +-----+                 +-----+
 >                        |     |                 |     |
 >            192.168.1.3 |  C  |                 |  S  | 192.168.2.1
 >                        |     |                 |     |
 >                        +-----+                 +-----+
 > 
 > 
 > Relevant parts of netstat -rn on C during normal operation:
 > -------------------------------------------------------------
 > Destination        Gateway            Flags     Netif Expire
 > default            192.168.1.1        UGSc      fxp0
 > 127.0.0.1          127.0.0.1          UH        lo0
 > 192.168.1          link#1             UC        fxp0 =>
 > 192.168.1.1        0:e0:18:90:91:bb   UHLW      fxp0   1182
 > 192.168.1.2        0:e0:18:90:94:c8   UHLW      fxp0   1058
 > 192.168.1.3        0:e0:18:90:45:dc   UHLW      lo0
 > 192.168.1.255      ff:ff:ff:ff:ff:ff  UHLWb     fxp0
 > 192.168.2          192.168.1.2        UGc       fxp0
 > 
 > 
 > The syslogd on host C is configured to log messages
 > to syslogd running on host S. This works perfectly,
 > all messages appear on host S.
 > 
 > Now we delete the route to net 192.168.2.0 on host C (this
 > can appear automatically if router 2 and/or its routed go
 > down for a while). If syslogd now wants to send a message
 > to S, the kernel uses the default route which is obvious
 > because the route to net 192.168.2.0 is gone. We can see the
 > packets go into router 1. I consider this as the correct
 > behaviour as well.
 > 
 > Now we bring back the route to net 192.168.2.0 again on host
 > C exactly as it was before (e.g. by restarting router 2 and/or
 > its routed). We can verify this with netstat -rn on C. We can
 > also ping host S or telnet to it or do other stuff which all
 > work perfectly.
 > 
 > The problem is that each time when syslogd on C wants to send
 > a packet to S, the kernel still uses 1 as router even though
 > it should send them through 2.  After HUPing or restarting
 > syslogd on C (which means that the UDP socket is closed and
 > opened again) things are back to normal.
 > 
 > It seems that as long as packets can be send somewhere, the
 > kernel doesn't bother if there is a better route to the
 > destination until the socket is closed and opened again.
 > 
 > >How-To-Repeat:
 > 
 > See above.
 > 
 I can't reproduce this problem on my 4.3-STABLE box.
 
 Yes, the UDP socket has the reference to the protocol-cloned
 route to the destination host S through the router 1 initially,
 and UDP packets go through that router.
 
 In my tests, router 1 (192.168.1.1) was the host *not* configured
 to act as the router, so all "foreign" packets sent to it got
 silently ignored.  I used the ports/net/netcat utility to connect
 to the UDP `echo' port of the destination S (192.168.2.1):
 
 Fig.1: Initial state, before UDP socket is open.
 
 : # netstat -arn
 : Destination        Gateway            Flags     Refs     Use     Netif Expire
 : default            192.168.1.1        UGSc        0        2      rl0
 : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 : 192.168.1          link#1             UC          3        0      rl0 =>
 
 
 Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).
 
 : # nc -u 192.168.2.1 echo
 : ping1
 : ping2
 : ping3
 [...]
 
 As you can see, we receive no echos back.
 
 
 Fig.3: Routing table after UDP socket is open.
 
 : # netstat -arn
 : Destination        Gateway            Flags     Refs     Use     Netif Expire
 : default            192.168.1.1        UGSc        1        2      rl0
 : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 : 192.168.1          link#1             UC          4        0      rl0 =>
 : 192.168.2.1        192.168.1.1        UGHW        1       14      rl0
 
 The route to S (192.168.2.1) was cloned (W) from the `default' route.
 refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
 a reference to this route.
 
 Fig.4: I manually add the route to the 192.168.2 network.
 
 : # route add -net 192.168.2   192.168.1.2 
 : add net 192.168.2: gateway 192.168.1.2 
 
 Fig.5: Routing table after the route to the 192.168.2 network was added.
 
 : # netstat -arn
 : Destination        Gateway            Flags     Refs     Use     Netif Expire
 : default            192.168.1.1        UGSc        1        2      rl0
 : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 : 192.168.1          link#1             UC          4        0      rl0 =>
 : 192.168.2          192.168.1.2        UGSc        0        0      rl0
 
 As you can see, the route to the 192.168.2.1 host is deleted from the routing
 table.  It actually doesn't get freed completely, as it had non-zero reference
 count (UDP socket still holds on it), but instead it gets marked as DOWN, and
 will be freed and reallocated in ip_output() on the next use.
 
 Fig.6: We continue to send UDP datagrams.
 
 : # nc -u 192.168.2.1 echo (continued)
 : ping4
 : ping4
 : ping5
 : ping5
 : ping6
 : ping6
 
 As you can see, this time we get the echos back.
 
 Fig.7: Routing table after we sent more UDP datagrams.
 
 : # netstat -arn -finet
 : Destination        Gateway            Flags     Refs     Use     Netif Expire
 : default            192.168.1.1        UGSc        0        2      rl0
 : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 : 192.168.1          link#1             UC          4        0      rl0 =>
 : 192.168.2          192.168.1.2        UGSc        1        3      rl0
 
 The refcount on 192.168.2 route has grown to 1, indicating that the
 UDP socket now holds on this route.  The `Use' count of 3 corresponds
 to our three UDP datagrams (ping4, ping5, and ping6).
 
 Could you please repeat these steps in your environment, and try to
 detect where it behaved differently in your case.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: Ruslan Ermilov <ru@FreeBSD.org>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	bug-followup@FreeBSD.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 12:29:04 +0200

 Thanks for helping...
 
 On Wed, 06-Jun-2001 at 11:24:19 +0300, Ruslan Ermilov wrote:
 >
 > ...
 > 
 > I can't reproduce this problem on my 4.3-STABLE box.
 > 
 > Yes, the UDP socket has the reference to the protocol-cloned
 > route to the destination host S through the router 1 initially,
 > and UDP packets go through that router.
 > 
 > In my tests, router 1 (192.168.1.1) was the host *not* configured
 > to act as the router, so all "foreign" packets sent to it got
 
 OK, I have blocked packets coming from C on router 1. So
 I think I got the same config as you.
 
 
 > silently ignored.  I used the ports/net/netcat utility to connect
 > to the UDP `echo' port of the destination S (192.168.2.1):
 > 
 > Fig.1: Initial state, before UDP socket is open.
 > 
 > : # netstat -arn
 > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > : default            192.168.1.1        UGSc        0        2      rl0
 > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > : 192.168.1          link#1             UC          3        0      rl0 =>
 > 
 > 
 > Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).
 > 
 > : # nc -u 192.168.2.1 echo
 > : ping1
 > : ping2
 > : ping3
 > [...]
 > 
 > As you can see, we receive no echos back.
 
 OK, same here.
 
 
 > Fig.3: Routing table after UDP socket is open.
 > 
 > : # netstat -arn
 > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > : default            192.168.1.1        UGSc        1        2      rl0
 > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > : 192.168.1          link#1             UC          4        0      rl0 =>
 > : 192.168.2.1        192.168.1.1        UGHW        1       14      rl0
 > 
 > The route to S (192.168.2.1) was cloned (W) from the `default' route.
 > refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
 > a reference to this route.
 
 Same here:
 
 192.168.2.1       192.168.1.1        UGHW        1      425     fxp0
 
 
 > Fig.4: I manually add the route to the 192.168.2 network.
 > 
 > : # route add -net 192.168.2   192.168.1.2 
 > : add net 192.168.2: gateway 192.168.1.2 
 
 OK, I don;t add it manually but wait until routed messages from
 192.168.1.2 brings it back.
 
 
 > 
 > Fig.5: Routing table after the route to the 192.168.2 network was added.
 > 
 > : # netstat -arn
 > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > : default            192.168.1.1        UGSc        1        2      rl0
 > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > : 192.168.1          link#1             UC          4        0      rl0 =>
 > : 192.168.2          192.168.1.2        UGSc        0        0      rl0
 
 Yup, same here
 
 
 > As you can see, the route to the 192.168.2.1 host is deleted from the routing
 > table.  It actually doesn't get freed completely, as it had non-zero reference
 > count (UDP socket still holds on it), but instead it gets marked as DOWN, and
 > will be freed and reallocated in ip_output() on the next use.
 > 
 > Fig.6: We continue to send UDP datagrams.
 > 
 > : # nc -u 192.168.2.1 echo (continued)
 > : ping4
 > : ping4
 > : ping5
 > : ping5
 > : ping6
 > : ping6
 > 
 > As you can see, this time we get the echos back.
 
 Yes, same here :-(
 
 
 > Fig.7: Routing table after we sent more UDP datagrams.
 > 
 > : # netstat -arn -finet
 > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > : default            192.168.1.1        UGSc        0        2      rl0
 > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > : 192.168.1          link#1             UC          4        0      rl0 =>
 > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
 > 
 > The refcount on 192.168.2 route has grown to 1, indicating that the
 > UDP socket now holds on this route.  The `Use' count of 3 corresponds
 > to our three UDP datagrams (ping4, ping5, and ping6).
 > 
 > Could you please repeat these steps in your environment, and try to
 > detect where it behaved differently in your case.
 
 It doesn't behave differently, that's interesting. May I ask you to
 try it using syslogd?
 
 - Let host C log to host S (with the route installed).
 - Watch C's messages appear on S.
 - Delete C's route to S (via router 2)
 - Let host C log again (run tcpdump on router 1 to see the packets come in)
 - Install the route to S (via router 2) again on C
 - Log more stuff. If you don't see the packets go into router 1 anymore
   I am really lost...
 
 Thanks,
 
 	-Andre

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: David Malone <dwmalone@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 13:19:26 +0200

 On Tue, 05-Jun-2001 at 20:05:57 +0100, David Malone wrote:
 > On Tue, Jun 05, 2001 at 06:30:14PM +0200, Andre Albsmeier wrote:
 > > The problem is that each time when syslogd on C wants to send
 > > a packet to S, the kernel still uses 1 as router even though
 > > it should send them through 2.  After HUPing or restarting
 > > syslogd on C (which means that the UDP socket is closed and
 > > opened again) things are back to normal.
 > 
 > This sounds like it is to do with the caching of recently used
 > routes. Does the effect go away if you leave it for a while?
 
 I had it running for 3 hours now but it didn't change. As
 soon as I HUP'ed syslogd it worked.
 
 
 > Adjusting some of the following sysctls might change how long
 > you have to wait:
 > 
 > net.inet.ip.rtexpire
 > net.inet.ip.rtminexpire
 > net.inet.ip.rtmaxcache
 > 
 > (Some of the networking people could definitely provide more
 > details.)
 
 Ruslan Ermilov <ru@FreeBSD.org> sent me a mail and asked
 me to reproduce the behaviour with netcat. This worked
 properly so maybe it is really some issue with syslogd.
 I will try to isolate the problem as soon as Ruslan
 can reproduce it with syslogd.
 
 Thanks,
 
 	-Andre

From: Ruslan Ermilov <ru@FreeBSD.org>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 15:32:05 +0300

 On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
 > Thanks for helping...
 > 
 > On Wed, 06-Jun-2001 at 11:24:19 +0300, Ruslan Ermilov wrote:
 > >
 > > ...
 > > 
 > > I can't reproduce this problem on my 4.3-STABLE box.
 > > 
 > > Yes, the UDP socket has the reference to the protocol-cloned
 > > route to the destination host S through the router 1 initially,
 > > and UDP packets go through that router.
 > > 
 > > In my tests, router 1 (192.168.1.1) was the host *not* configured
 > > to act as the router, so all "foreign" packets sent to it got
 > 
 > OK, I have blocked packets coming from C on router 1. So
 > I think I got the same config as you.
 > 
 > 
 > > silently ignored.  I used the ports/net/netcat utility to connect
 > > to the UDP `echo' port of the destination S (192.168.2.1):
 > > 
 > > Fig.1: Initial state, before UDP socket is open.
 > > 
 > > : # netstat -arn
 > > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > > : default            192.168.1.1        UGSc        0        2      rl0
 > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > : 192.168.1          link#1             UC          3        0      rl0 =>
 > > 
 > > 
 > > Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1).
 > > 
 > > : # nc -u 192.168.2.1 echo
 > > : ping1
 > > : ping2
 > > : ping3
 > > [...]
 > > 
 > > As you can see, we receive no echos back.
 > 
 > OK, same here.
 > 
 > 
 > > Fig.3: Routing table after UDP socket is open.
 > > 
 > > : # netstat -arn
 > > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > > : default            192.168.1.1        UGSc        1        2      rl0
 > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > : 192.168.1          link#1             UC          4        0      rl0 =>
 > > : 192.168.2.1        192.168.1.1        UGHW        1       14      rl0
 > > 
 > > The route to S (192.168.2.1) was cloned (W) from the `default' route.
 > > refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds
 > > a reference to this route.
 > 
 > Same here:
 > 
 > 192.168.2.1       192.168.1.1        UGHW        1      425     fxp0
 > 
 > 
 > > Fig.4: I manually add the route to the 192.168.2 network.
 > > 
 > > : # route add -net 192.168.2   192.168.1.2 
 > > : add net 192.168.2: gateway 192.168.1.2 
 > 
 > OK, I don;t add it manually but wait until routed messages from
 > 192.168.1.2 brings it back.
 > 
 > 
 > > 
 > > Fig.5: Routing table after the route to the 192.168.2 network was added.
 > > 
 > > : # netstat -arn
 > > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > > : default            192.168.1.1        UGSc        1        2      rl0
 > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > : 192.168.1          link#1             UC          4        0      rl0 =>
 > > : 192.168.2          192.168.1.2        UGSc        0        0      rl0
 > 
 > Yup, same here
 > 
 > 
 > > As you can see, the route to the 192.168.2.1 host is deleted from the routing
 > > table.  It actually doesn't get freed completely, as it had non-zero reference
 > > count (UDP socket still holds on it), but instead it gets marked as DOWN, and
 > > will be freed and reallocated in ip_output() on the next use.
 > > 
 > > Fig.6: We continue to send UDP datagrams.
 > > 
 > > : # nc -u 192.168.2.1 echo (continued)
 > > : ping4
 > > : ping4
 > > : ping5
 > > : ping5
 > > : ping6
 > > : ping6
 > > 
 > > As you can see, this time we get the echos back.
 > 
 > Yes, same here :-(
 > 
 > 
 > > Fig.7: Routing table after we sent more UDP datagrams.
 > > 
 > > : # netstat -arn -finet
 > > : Destination        Gateway            Flags     Refs     Use     Netif Expire
 > > : default            192.168.1.1        UGSc        0        2      rl0
 > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > : 192.168.1          link#1             UC          4        0      rl0 =>
 > > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
 > > 
 > > The refcount on 192.168.2 route has grown to 1, indicating that the
 > > UDP socket now holds on this route.  The `Use' count of 3 corresponds
 > > to our three UDP datagrams (ping4, ping5, and ping6).
 > > 
 > > Could you please repeat these steps in your environment, and try to
 > > detect where it behaved differently in your case.
 > 
 > It doesn't behave differently, that's interesting. May I ask you to
 > try it using syslogd?
 > 
 > - Let host C log to host S (with the route installed).
 > - Watch C's messages appear on S.
 > - Delete C's route to S (via router 2)
 > - Let host C log again (run tcpdump on router 1 to see the packets come in)
 > - Install the route to S (via router 2) again on C
 > - Log more stuff. If you don't see the packets go into router 1 anymore
 >   I am really lost...
 > 
 Yes, I have reproduced the problem here.  My test misses one step.
 OK, now about what happens here.
 
 Initially, there is the route (cloned from the network route) to S
 (192.168.2.1) through the router 2 (192.168.1.2).  UDP socket uses
 this route initially.  When this (and the 192.168.2 network) routes
 disappear, on the next write (!), ip_output() detects that the S
 route is DOWN, and "allocates" (caches) another route, which happens
 to be the "default" route pointing to router 1 (192.168.1.1).
 Later, when the route to the 192.168.2 network gets installed again,
 it's not taken into account, as the cached ("default") route is still
 UP.
 
 Unfortunately, there is no easy way to fix this.  Checking for
 the best-match route on every write may be too time consuming.
 As the workaround, you can delete and re-add your "default"
 route.  This worked for me here.  `route delete default' will
 delete the "default" route from the routing table, but because
 it has a refcnt>0 will not delete it immediately, but will mark
 it as DOWN.  ip_output() for this UDP socket's write will detect
 that the cached route is DOWN, will free it, and allocate a new
 route, which will be the route to the 192.168.2 network through
 router 2 (192.168.1.2) this time.
 
 The actual fix would be to notify protocol (from within the
 routing code) whenever its routing table is modified.  This
 notification could then be saved in a variable as timestamp,
 and every PCB-cached route could have a similar timestamp as
 well, indicating when this "caching" took place.  Having
 that, ip_output() would "invalidate" cached route if it was
 cached before the last routing table modification was done.
 
 I could probably try to implement this, if no one else can
 come up with a better idea.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
State-Changed-From-To: open->closed 
State-Changed-By: ru 
State-Changed-When: Wed Jun 6 07:27:57 PDT 2001 
State-Changed-Why:  
The analysis shows this PR is the duplicate of PR kern/10778. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=27890 

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: Ruslan Ermilov <ru@FreeBSD.org>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	bug-followup@FreeBSD.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 16:29:33 +0200

 On Wed, 06-Jun-2001 at 15:32:05 +0300, Ruslan Ermilov wrote:
 > On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
 > > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > > : 192.168.1          link#1             UC          4        0      rl0 =>
 > > > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
 > > > 
 > > > The refcount on 192.168.2 route has grown to 1, indicating that the
 > > > UDP socket now holds on this route.  The `Use' count of 3 corresponds
 > > > to our three UDP datagrams (ping4, ping5, and ping6).
 > > > 
 > > > Could you please repeat these steps in your environment, and try to
 > > > detect where it behaved differently in your case.
 > > 
 > > It doesn't behave differently, that's interesting. May I ask you to
 > > try it using syslogd?
 > > 
 > > - Let host C log to host S (with the route installed).
 > > - Watch C's messages appear on S.
 > > - Delete C's route to S (via router 2)
 > > - Let host C log again (run tcpdump on router 1 to see the packets come in)
 > > - Install the route to S (via router 2) again on C
 > > - Log more stuff. If you don't see the packets go into router 1 anymore
 > >   I am really lost...
 > > 
 > Yes, I have reproduced the problem here.  My test misses one step.
 
 Hmm, I just wonder why syslogd behaves differently...
 
 > OK, now about what happens here.
 > 
 > Initially, there is the route (cloned from the network route) to S
 > (192.168.2.1) through the router 2 (192.168.1.2).  UDP socket uses
 > this route initially.  When this (and the 192.168.2 network) routes
 > disappear, on the next write (!), ip_output() detects that the S
 > route is DOWN, and "allocates" (caches) another route, which happens
 > to be the "default" route pointing to router 1 (192.168.1.1).
 > Later, when the route to the 192.168.2 network gets installed again,
 > it's not taken into account, as the cached ("default") route is still
 > UP.
 
 So this would match my (rather amateurish) description when saying:
 
 It seems that as long as packets can be send somewhere, the
 kernel doesn't bother if there is a better route to the
 destination until the socket is closed and opened again.
 
 
 > Unfortunately, there is no easy way to fix this.  Checking for
 > the best-match route on every write may be too time consuming.
 > As the workaround, you can delete and re-add your "default"
 > route.  This worked for me here.  `route delete default' will
 
 Just tried it, worked here as well.
 
 > delete the "default" route from the routing table, but because
 > it has a refcnt>0 will not delete it immediately, but will mark
 > it as DOWN.  ip_output() for this UDP socket's write will detect
 > that the cached route is DOWN, will free it, and allocate a new
 > route, which will be the route to the 192.168.2 network through
 > router 2 (192.168.1.2) this time.
 > 
 > The actual fix would be to notify protocol (from within the
 > routing code) whenever its routing table is modified.  This
 > notification could then be saved in a variable as timestamp,
 > and every PCB-cached route could have a similar timestamp as
 > well, indicating when this "caching" took place.  Having
 > that, ip_output() would "invalidate" cached route if it was
 > cached before the last routing table modification was done.
 > 
 > I could probably try to implement this, if no one else can
 > come up with a better idea.
 
 I can only offer to test any new code since my knowledge about
 the corresponding parts in the kernel is not sufficient to
 implement it.
 
 Thanks so far,
 
 	-Andre

From: Ruslan Ermilov <ru@FreeBSD.org>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/27890: FreeBSD not always seems to take the best route
Date: Wed, 6 Jun 2001 17:56:15 +0300

 On Wed, Jun 06, 2001 at 04:29:33PM +0200, Andre Albsmeier wrote:
 > On Wed, 06-Jun-2001 at 15:32:05 +0300, Ruslan Ermilov wrote:
 > > On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote:
 > > > > : 127.0.0.1          127.0.0.1          UH          1        6      lo0
 > > > > : 192.168.1          link#1             UC          4        0      rl0 =>
 > > > > : 192.168.2          192.168.1.2        UGSc        1        3      rl0
 > > > > 
 > > > > The refcount on 192.168.2 route has grown to 1, indicating that the
 > > > > UDP socket now holds on this route.  The `Use' count of 3 corresponds
 > > > > to our three UDP datagrams (ping4, ping5, and ping6).
 > > > > 
 > > > > Could you please repeat these steps in your environment, and try to
 > > > > detect where it behaved differently in your case.
 > > > 
 > > > It doesn't behave differently, that's interesting. May I ask you to
 > > > try it using syslogd?
 > > > 
 > > > - Let host C log to host S (with the route installed).
 > > > - Watch C's messages appear on S.
 > > > - Delete C's route to S (via router 2)
 > > > - Let host C log again (run tcpdump on router 1 to see the packets come in)
 > > > - Install the route to S (via router 2) again on C
 > > > - Log more stuff. If you don't see the packets go into router 1 anymore
 > > >   I am really lost...
 > > > 
 > > Yes, I have reproduced the problem here.  My test misses one step.
 > 
 > Hmm, I just wonder why syslogd behaves differently...
 > 
 Because my test missed one step: the route to S through router 2
 should exist initially to reproduce this with netcat(1).  You
 then send some data, delete the route, again send data so that
 the "default" route gets cached, and install the route to S
 again.
 
 
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
>Unformatted:
