From nobody@FreeBSD.org  Mon Jul 23 13:50:13 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 15A4837B412
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 23 Jul 2001 13:50:13 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.4/8.11.4) id f6NKoDT42365;
	Mon, 23 Jul 2001 13:50:13 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200107232050.f6NKoDT42365@freefall.freebsd.org>
Date: Mon, 23 Jul 2001 13:50:13 -0700 (PDT)
From: Voradesh Yenbut <yenbut@cs.washington.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ARP request fails after "bad gateway value" in if_ether.c
X-Send-Pr-Version: www-1.0

>Number:         29170
>Category:       kern
>Synopsis:       [patch] ARP request fails after "bad gateway value" in if_ether.c
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    remko
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 23 14:00:00 PDT 2001
>Closed-Date:    Thu Feb 21 17:57:04 UTC 2008
>Last-Modified:  Thu Feb 21 17:57:04 UTC 2008
>Originator:     Voradesh Yenbut
>Release:        4.2, 3.4
>Organization:
CSE, U of Washington
>Environment:
FreeBSD bs8.cs.washington.edu 4.2-RELEASE FreeBSD 4.2-RELEASE #2: Mon Jul 23 12:13:29 PDT 2001     root@orion.cs.washington.edu:/usr/src/sys/compile/BS-GENERIC  i386

>Description:

We have several FreeBSD systems running DNS servers.  For some unknown
reasons, one of the systems serving a subnet where most clients run
Windows 2000, occasionally failed to do arp address resolution.

The kernel logged messages like the followings:

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.74 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.74rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.233 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.233rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.232 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.232rt

  arplookup 128.95.8.233 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.233rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.230 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.230rt

  arp_rtrequest: bad gateway value
  arplookup 128.95.8.160 failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for 128.95.8.160rt

ARP requests to the addresses above failed afterward.  A system reboot
made ARP requests work again, but sooner or later the same problem
comes back.

As I searched FreeBSD mailing lists for a solution, several reports
of similar problems were found but I did not see a good solution.


>How-To-Repeat:
I don't know how to repeat this, but it can be simulated by making a
condition in arp_rtrequest() of /usr/src/sys/netinet/if_ether.c to
break out of RTM_RESOLVE.  For example,

 The following code use a static variable:

   static int toggle = 1;  /* added */

 to simulate one fault with bad gateway value condition.

              case RTM_RESOLVE:
                if (gate->sa_family != AF_LINK ||
                    toggle ||                           /* added */
                    gate->sa_len < sizeof(null_sdl)) {
                       log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n");
                       if (toggle) toggle = 0;          /* added */
                       break;
                 }

 After a system reboot, the system will generate "rp_rtrequest: bad
 gateway value" to the first host it tries to contact which is
 is likely to be its default gateway.  Even though toggle's value
 is 0, subsequent attempts to contact the host generates messages:

  arplookup xx.xx.x.xxx failed: could not allocate llinfo
  arpresolve: can't allocate llinfo for xx.xx.xx.xxrt

This leads to believe that a good cleanup is not automatically done to
a route if for some reasons it has an error.

>Fix:
I don't completely understand the arp code so may not have an insight
to really correct the problem, but the following patch seems to get
around the problem ("bad gateway value" is still seen but no more messages
about llinfo and arp works with the address causing the message.):

--- if_ether.c  2001/07/23 16:35:07     1.1
+++ if_ether.c  2001/07/23 19:13:24
@@ -199,7 +199,13 @@
        case RTM_RESOLVE:
                if (gate->sa_family != AF_LINK ||
                    gate->sa_len < sizeof(null_sdl)) {
-                       log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n");
+                       log(LOG_DEBUG, "arp_rtrequest: %s bad gateway value %s\n",
+                           inet_ntoa(SIN(rt_key(rt))->sin_addr),
+                           gate->sa_family != AF_LINK? "family": "");
+                       rtrequest(RTM_DELETE,
+                                 (struct sockaddr *)rt_key(rt),
+                                 rt->rt_gateway,
+                                 rt_mask(rt), rt->rt_flags, 0);
                        break;
                }
                SDL(gate)->sdl_type = rt->rt_ifp->if_type;


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: ru 
State-Changed-When: Thu Oct 18 07:20:34 PDT 2001 
State-Changed-Why:  
Do you have a routed(8) daemon running? 


Responsible-Changed-From-To: freebsd-bugs->ru 
Responsible-Changed-By: ru 
Responsible-Changed-When: Thu Oct 18 07:20:34 PDT 2001 
Responsible-Changed-Why:  
I can easily reproduce this with routed(8) and route(8), 
and understand what's going on, but not sure if this is 
the routed(8) problem or kernel's. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=29170 

From: Paul Herman <pherman@frenchfries.net>
To: Voradesh Yenbut <yenbut@cs.washington.edu>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG, <ru@FreeBSD.ORG>
Subject: Re: kern/29170: ARP request fails after "bad gateway value" in
 if_ether.c
Date: Wed, 28 Nov 2001 16:22:08 -0800 (PST)

 The following patch (against 4.4-RELEASE) solves this problem.  In
 -CURRENT it's a little different, but the same if condition should
 apply, as long as it appears before the rt_setgate() statement.
 
 Voradesh, does this solve your problem?
 
 -Paul.
 
 Index: sys/net/rtsock.c
 ===================================================================
 RCS file: /mnt/ncvs/src/sys/net/rtsock.c,v
 retrieving revision 1.44.2.4
 diff -u -r1.44.2.4 rtsock.c
 --- sys/net/rtsock.c	2001/07/11 09:37:37	1.44.2.4
 +++ sys/net/rtsock.c	2001/11/27 01:33:03
 @@ -399,6 +399,14 @@
  			break;
 
  		case RTM_CHANGE:
 +			/* Don't let the user specify non-link information
 +			 * for a gateway if the RTF_LLINFO flag is set.
 +			 * We'll just leave the gateway alone.
 +			 */
 +			if (gate && (rt->rt_flags & RTF_LLINFO) &&
 +			    gate->sa_family != AF_LINK)
 +				gate = rt->rt_gateway;
 +
  			if (gate && (error = rt_setgate(rt, rt_key(rt), gate)))
  				senderr(error);
 

From: Voradesh Yenbut <yenbut@cs.washington.edu>
To: Paul Herman <pherman@frenchfries.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG, ru@FreeBSD.ORG
Subject: Re: kern/29170: ARP request fails after "bad gateway value" in if_ether.c 
Date: Thu, 29 Nov 2001 15:31:17 -0800

 Thanks for the patch.  Unfortunately, it did not solve my problem.
 
 The kernel was changed from 4.2 to 4.4 with the patch. After a while
 the usual error messages were printed, and no communication to IP addresses
 listed in the message was possible afterward.
 
 Below is an example of messages (192.168.85 is a HP LaserJet 4M and 
 128.95.8.25 is a win2k machine.)
 
 
 Nov 29 14:58:08 bs8 /kernel: arp_rtrequest: bad gateway value
 Nov 29 14:58:08 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 Nov 29 14:58:08 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 
 Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 
 Nov 29 15:10:22 bs8 /kernel: arp_rtrequest: bad gateway value
 Nov 29 15:10:22 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:22 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 Nov 29 15:10:33 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:33 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 Nov 29 15:10:45 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:45 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 Nov 29 15:10:46 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 Nov 29 15:10:46 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 

From: Ruslan Ermilov <ru@FreeBSD.ORG>
To: Voradesh Yenbut <yenbut@cs.washington.edu>
Cc: Paul Herman <pherman@frenchfries.net>,
	FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/29170: ARP request fails after "bad gateway value" in if_ether.c
Date: Fri, 30 Nov 2001 15:19:25 +0200

 On Thu, Nov 29, 2001 at 03:31:17PM -0800, Voradesh Yenbut wrote:
 > Thanks for the patch.  Unfortunately, it did not solve my problem.
 > 
 > The kernel was changed from 4.2 to 4.4 with the patch. After a while
 > the usual error messages were printed, and no communication to IP addresses
 > listed in the message was possible afterward.
 > 
 > Below is an example of messages (192.168.85 is a HP LaserJet 4M and 
 > 128.95.8.25 is a win2k machine.)
 > 
 > 
 > Nov 29 14:58:08 bs8 /kernel: arp_rtrequest: bad gateway value
 > Nov 29 14:58:08 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 > Nov 29 14:58:08 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 > 
 > Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 > Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 > Nov 29 14:58:31 bs8 /kernel: arplookup 192.168.8.85 failed: could not allocate llinfo
 > Nov 29 14:58:31 bs8 /kernel: arpresolve: can't allocate llinfo for 192.168.8.85rt
 > 
 > Nov 29 15:10:22 bs8 /kernel: arp_rtrequest: bad gateway value
 > Nov 29 15:10:22 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:22 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > Nov 29 15:10:29 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:29 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > Nov 29 15:10:33 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:33 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > Nov 29 15:10:45 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:45 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > Nov 29 15:10:46 bs8 /kernel: arplookup 128.95.8.25 failed: could not allocate llinfo
 > Nov 29 15:10:46 bs8 /kernel: arpresolve: can't allocate llinfo for 128.95.8.25rt
 > 
 Your routing table is screwed.  These "can't allocate llinfo" say this.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age

From: Ruslan Ermilov <ru@FreeBSD.org>
To: Paul Herman <pherman@frenchfries.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/29170: ARP request fails after "bad gateway value" in if_ether.c
Date: Sat, 1 Dec 2001 19:28:55 +0200

 On Wed, Nov 28, 2001 at 04:22:08PM -0800, Paul Herman wrote:
 > 
 > The following patch (against 4.4-RELEASE) solves this problem.  In
 > -CURRENT it's a little different, but the same if condition should
 > apply, as long as it appears before the rt_setgate() statement.
 > 
 > Voradesh, does this solve your problem?
 > 
 > -Paul.
 > 
 > Index: sys/net/rtsock.c
 > ===================================================================
 > RCS file: /mnt/ncvs/src/sys/net/rtsock.c,v
 > retrieving revision 1.44.2.4
 > diff -u -r1.44.2.4 rtsock.c
 > --- sys/net/rtsock.c	2001/07/11 09:37:37	1.44.2.4
 > +++ sys/net/rtsock.c	2001/11/27 01:33:03
 > @@ -399,6 +399,14 @@
 >  			break;
 > 
 >  		case RTM_CHANGE:
 > +			/* Don't let the user specify non-link information
 > +			 * for a gateway if the RTF_LLINFO flag is set.
 > +			 * We'll just leave the gateway alone.
 > +			 */
 > +			if (gate && (rt->rt_flags & RTF_LLINFO) &&
 > +			    gate->sa_family != AF_LINK)
 > +				gate = rt->rt_gateway;
 > +
 >  			if (gate && (error = rt_setgate(rt, rt_key(rt), gate)))
 >  				senderr(error);
 > 
 Paul,
 
 If we deny this combo for RTM_CHANGE, we should then deny it for RTM_ADD
 as well.  For example, "route add -host 1.2.3.4 5.6.7.8 -llinfo" shouldn't
 create RTF_LLINFO entry with AF_INET gateway.  Perhaps in this case (RTM_ADD),
 the code should return EINVAL.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age

From: Paul Herman <pherman@frenchfries.net>
To: Ruslan Ermilov <ru@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/29170: ARP request fails after "bad gateway value" in
 if_ether.c
Date: Sat, 1 Dec 2001 15:02:02 -0800 (PST)

 On Sat, 1 Dec 2001, Ruslan Ermilov wrote:
 
 > On Wed, Nov 28, 2001 at 04:22:08PM -0800, Paul Herman wrote:
 > >
 > > The following patch (against 4.4-RELEASE) solves this problem.  In
 > > -CURRENT it's a little different, but the same if condition should
 > > apply, as long as it appears before the rt_setgate() statement.
 >
 > If we deny this combo for RTM_CHANGE, we should then deny it for
 > RTM_ADD as well.  For example, "route add -host 1.2.3.4 5.6.7.8
 > -llinfo" shouldn't create RTF_LLINFO entry with AF_INET gateway.
 > Perhaps in this case (RTM_ADD), the code should return EINVAL.
 
 Hi Ruslan,
 
 Yes.  In fact, it should ideally be in rt_setgate() which will
 catch all cases.  The reason I didn't do this was because the IPV6
 stack, as I found out, *does* put AF_INET information as a gateway
 with the LLINFO bit set. :-( This is why I went conservative and
 only made a small change.
 
 Adding it to RTM_ADD I think would be a good thing, and returning
 EINVAL should be OK as long as it works with routed (haven't
 checked.)
 
 -Paul.
 
State-Changed-From-To: feedback->open 
State-Changed-By: remko 
State-Changed-When: Sun Nov 12 10:36:34 UTC 2006 
State-Changed-Why:  
Reset state to open, feedback had been recieved a while ago 

http://www.freebsd.org/cgi/query-pr.cgi?pr=29170 
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Wed Dec 13 14:29:12 UTC 2006 
State-Changed-Why:  
steal this ticket from ru to obtain feedback about the 
current status of this problem (I will bring it back 
to ruslan with more information if possible :-)). 


Responsible-Changed-From-To: ru->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Wed Dec 13 14:29:12 UTC 2006 
Responsible-Changed-Why:  
Grab the ticket from ru so that i can trace the feedback. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=29170 
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Thu Feb 21 17:57:03 UTC 2008 
State-Changed-Why:  
Feedback timeout (never received) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=29170 
>Unformatted:
