From nobody@FreeBSD.org  Wed Apr 12 19:18:39 2000
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21])
	by hub.freebsd.org (Postfix) with ESMTP id AC0DE37BB97
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 12 Apr 2000 19:18:39 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.9.3/8.9.2) id TAA12378;
	Wed, 12 Apr 2000 19:18:39 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Message-Id: <200004130218.TAA12378@freefall.freebsd.org>
Date: Wed, 12 Apr 2000 19:18:39 -0700 (PDT)
From: brian@pocketscience.com
Sender: nobody@FreeBSD.org
To: freebsd-gnats-submit@FreeBSD.org
Subject: NATD appears to memory leak when a connection fails from the internal network to the external network.
X-Send-Pr-Version: www-1.0

>Number:         17963
>Category:       bin
>Synopsis:       NATD appears to memory leak when a connection fails from the internal network to the external network.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Apr 12 19:20:01 PDT 2000
>Closed-Date:    Mon Apr 17 02:12:53 PDT 2000
>Last-Modified:  Mon Apr 17 02:13:47 PDT 2000
>Originator:     Brian Nelson
>Release:        3.4-STABLE
>Organization:
PocketScience, Inc
>Environment:
FreeBSD vpn1.pocketmail.com 3.4-STABLE FreeBSD 3.4-STABLE #0: Wed Apr 12 11:14:46 PDT 2000     notgod@vpn1.pocketmail.com:/usr/src/sys/compile/VPN  i386

>Description:
In production, we are making several connection attempts to do AOL 
polling.  Some are getting a failure to connect (actually, a 
significant number are).  Since we have noticed this behavior (a bug 
on our end), we have also noticed that natd memory leaks, actually 
pretty significantly.

We're pulling ~50k connections/hour.  It takes ~16 hours for the 
daemon to leak enough that the network dies on the machine, until 
you restart natd.
>How-To-Repeat:
Set up natd.

from an internal machine, make several network connections that get 
dropped on the remote end (not denied, but connection timeouts)


>Fix:
None at this time.

>Release-Note:
>Audit-Trail:

From: Ruslan Ermilov <ru@ucb.crimea.ua>
To: brian@pocketscience.com
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: bin/17963: NATD appears to memory leak when a connection fails from the internal network to the external network.
Date: Thu, 13 Apr 2000 18:04:38 +0300

 On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
 > 
 > In production, we are making several connection attempts to do AOL 
 > polling.  Some are getting a failure to connect (actually, a 
 > significant number are).  Since we have noticed this behavior (a bug 
 > on our end), we have also noticed that natd memory leaks, actually 
 > pretty significantly.
 > 
 > We're pulling ~50k connections/hour.  It takes ~16 hours for the 
 > daemon to leak enough that the network dies on the machine, until 
 > you restart natd.
 > 
 Are these TCP connections?  (I will assume that they are below).
 Are these connections to the same remote machine/port?
 Are these connections from the same local machine/port?
 
 > >How-To-Repeat:
 > Set up natd.
 > 
 > from an internal machine, make several network connections that get 
 > dropped on the remote end (not denied, but connection timeouts)
 > 
 It is unclear what do you mean.  Do these connections get established,
 and then single-dropped by the remote end, or not established at all?
 In the first case, turning on and tuning a system-wide TCP keepalive
 on the client side might help.  Do you have it enabled?  What are the
 values of net.inet.tcp.*keep* MIB variables?
 
 Did you try running natd(8) with -log option, and monitoring the
 memory usage by `tail -f /var/log/alias.log'?
 
 
 -- 
 Ruslan Ermilov		Sysadmin and DBA of the
 ru@ucb.crimea.ua	United Commercial Bank,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.247.647	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
 

From: Brian Nelson <brian@pocketscience.com>
To: Ruslan Ermilov <ru@ucb.crimea.ua>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: bin/17963: NATD appears to memory leak when a connection fails from 
 the internal network to the external network.
Date: Thu, 13 Apr 2000 13:34:15 -0700

 Ruslan Ermilov wrote:
 > 
 > On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
 > >
 > > In production, we are making several connection attempts to do AOL
 > > polling.  Some are getting a failure to connect (actually, a
 > > significant number are).  Since we have noticed this behavior (a bug
 > > on our end), we have also noticed that natd memory leaks, actually
 > > pretty significantly.
 > >
 > > We're pulling ~50k connections/hour.  It takes ~16 hours for the
 > > daemon to leak enough that the network dies on the machine, until
 > > you restart natd.
 > >
 > Are these TCP connections?  (I will assume that they are below).
 > Are these connections to the same remote machine/port?
 > Are these connections from the same local machine/port?
 
 Yes, I am sorry, they are TCP connections.
 They are all connecting to americaonline.aol.com (this is DNS
 load-balanced) port 5190 (aol in /etc/services)
 The local port changes, since it's ~ 100 processes each on 7 internal
 machines.
 
 > 
 > > >How-To-Repeat:
 > > Set up natd.
 > >
 > > from an internal machine, make several network connections that get
 > > dropped on the remote end (not denied, but connection timeouts)
 > >
 > It is unclear what do you mean.  Do these connections get established,
 > and then single-dropped by the remote end, or not established at all?
 > In the first case, turning on and tuning a system-wide TCP keepalive
 > on the client side might help.  Do you have it enabled?  What are the
 > values of net.inet.tcp.*keep* MIB variables?
 
 They're never established.  Theyfail to successfully connect.  a tcpdump
 shows a lot of syn's and very few fin's.  The client machines are
 Solaris, so I am not sure how to do any TCP tuning.
 
 > 
 > Did you try running natd(8) with -log option, and monitoring the
 > memory usage by `tail -f /var/log/alias.log'?
 
 I will see if I can do this.   On another note, I added a ipfw rule to
 state all these connections, and now it's not leakign the way it was
 before.  (ipfw add 50 allow tcp from any to any 5190 keep-state)
 
 Please note, I am not a TCP hacker, and I am learning these things as I
 go along.  I totally appreciate your help here, friend.  thank you so
 much.
 
 > 
 > --
 > Ruslan Ermilov          Sysadmin and DBA of the
 > ru@ucb.crimea.ua        United Commercial Bank,
 > ru@FreeBSD.org          FreeBSD committer,
 > +380.652.247.647        Simferopol, Ukraine
 > 
 > http://www.FreeBSD.org  The Power To Serve
 > http://www.oracle.com   Enabling The Information Age
 

From: Ruslan Ermilov <ru@ucb.crimea.ua>
To: Brian Nelson <brian@pocketscience.com>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: bin/17963: NATD appears to memory leak when a connection fails from the internal network to the external network.
Date: Fri, 14 Apr 2000 10:33:22 +0300

 On Thu, Apr 13, 2000 at 01:34:15PM -0700, Brian Nelson wrote:
 > Ruslan Ermilov wrote:
 > > 
 > > On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
 > > >
 > > > In production, we are making several connection attempts to do AOL
 > > > polling.  Some are getting a failure to connect (actually, a
 > > > significant number are).  Since we have noticed this behavior (a bug
 > > > on our end), we have also noticed that natd memory leaks, actually
 > > > pretty significantly.
 > > >
 > > > We're pulling ~50k connections/hour.  It takes ~16 hours for the
 > > > daemon to leak enough that the network dies on the machine, until
 > > > you restart natd.
 > > >
 > > Are these TCP connections?  (I will assume that they are below).
 > > Are these connections to the same remote machine/port?
 > > Are these connections from the same local machine/port?
 > 
 > Yes, I am sorry, they are TCP connections.
 > They are all connecting to americaonline.aol.com (this is DNS
 > load-balanced) port 5190 (aol in /etc/services)
 > The local port changes, since it's ~ 100 processes each on 7 internal
 > machines.
 > 
 > > 
 > > > >How-To-Repeat:
 > > > Set up natd.
 > > >
 > > > from an internal machine, make several network connections that get
 > > > dropped on the remote end (not denied, but connection timeouts)
 > > >
 > > It is unclear what do you mean.  Do these connections get established,
 > > and then single-dropped by the remote end, or not established at all?
 > > In the first case, turning on and tuning a system-wide TCP keepalive
 > > on the client side might help.  Do you have it enabled?  What are the
 > > values of net.inet.tcp.*keep* MIB variables?
 > 
 > They're never established.  Theyfail to successfully connect.  a tcpdump
 > shows a lot of syn's and very few fin's.  The client machines are
 > Solaris, so I am not sure how to do any TCP tuning.
 > 
 Probably, I have a solution for you, but I need to know some details.
 
 Who (in the normal circumstances) closes the connection (sends FIN)?
 Client or server?
 
 Also, I would like to take a look on a tcpdump(1) log of one of these
 failing connections (without your keep-state rule for ipfw(8)).  The
 failing connection should be: client sends SYN and never gots neither
 RST nor SYN-ACK back from the server.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Sysadmin and DBA of the
 ru@ucb.crimea.ua	United Commercial Bank,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.247.647	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
 

From: Ruslan Ermilov <ru@FreeBSD.org>
To: brian@pocketscience.com, brian@FreeBSD.org, cmott@scientech.com,
	net@FreeBSD.org
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: bin/17963: NATD appears to memory leak when a connection fails from the internal network to the external network.
Date: Fri, 14 Apr 2000 12:17:59 +0300

 --ZGiS0Q5IWpPtfppv
 Content-Type: text/plain; charset=us-ascii
 
 On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
 > 
 [...]
 > from an internal machine, make several network connections that get 
 > dropped on the remote end (not denied, but connection timeouts)
 > 
 Please try the following patch.  It is for RELENG_3 (latest) sources.
 Extract patch to the currrent directory, then follow instructions:
 
 # mv ./p /tmp
 # cd /usr/src/lib/libalias
 # patch </tmp/p
 # make clean all install		# build/install new library
 # cd /usr/src/sbin/natd
 # make clean all install		# build/install natd with new library
 
 
 BACKGROUND
 
 The problem was that the TCP link's timeout was set to TCP_EXPIRE_CONNECTED
 (86400 secs) right after the first SYN from the client (or from the server
 for incoming connections).  With this change, this huge timeout value will
 only be applied to ESTABLISHED connections, i.e. only after SYN was seen
 from both client and server side.  TCP links corresponding to failed TCP
 connections (those which never receive neither SYN-ACK nor RST from server),
 will be dropped after TCP_EXPIRE_INITIAL (300 seconds) timeout.
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Sysadmin and DBA of the
 ru@ucb.crimea.ua	United Commercial Bank,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.247.647	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
 
 --ZGiS0Q5IWpPtfppv
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=p
 
 Index: alias_db.c
 ===================================================================
 RCS file: /usr/FreeBSD-CVS/src/lib/libalias/alias_db.c,v
 retrieving revision 1.10.2.5
 diff -u -p -r1.10.2.5 alias_db.c
 --- alias_db.c	1999/12/21 00:04:09	1.10.2.5
 +++ alias_db.c	2000/04/14 08:34:44
 @@ -1538,22 +1538,19 @@ SetStateIn(struct alias_link *link, int 
      /* TCP input state */
      switch (state) {
      case ALIAS_TCP_STATE_DISCONNECTED:
 -        if (link->data.tcp->state.out != ALIAS_TCP_STATE_CONNECTED) {
 +        if (link->data.tcp->state.out != ALIAS_TCP_STATE_CONNECTED)
              link->expire_time = TCP_EXPIRE_DEAD;
 -        } else {
 +        else
              link->expire_time = TCP_EXPIRE_SINGLEDEAD;
 -        }
 -        link->data.tcp->state.in = state;
          break;
      case ALIAS_TCP_STATE_CONNECTED:
 -        link->expire_time = TCP_EXPIRE_CONNECTED;
 -        /*FALLTHROUGH*/
 -    case ALIAS_TCP_STATE_NOT_CONNECTED:
 -        link->data.tcp->state.in = state;
 +        if (link->data.tcp->state.out == ALIAS_TCP_STATE_CONNECTED)
 +            link->expire_time = TCP_EXPIRE_CONNECTED;
          break;
      default:
          abort();
      }
 +    link->data.tcp->state.in = state;
  }
  
  
 @@ -1563,22 +1560,19 @@ SetStateOut(struct alias_link *link, int
      /* TCP output state */
      switch (state) {
      case ALIAS_TCP_STATE_DISCONNECTED:
 -        if (link->data.tcp->state.in != ALIAS_TCP_STATE_CONNECTED) {
 +        if (link->data.tcp->state.in != ALIAS_TCP_STATE_CONNECTED)
              link->expire_time = TCP_EXPIRE_DEAD;
 -        } else {
 +        else
              link->expire_time = TCP_EXPIRE_SINGLEDEAD;
 -        }
 -        link->data.tcp->state.out = state;
          break;
      case ALIAS_TCP_STATE_CONNECTED:
 -        link->expire_time = TCP_EXPIRE_CONNECTED;
 -        /*FALLTHROUGH*/
 -    case ALIAS_TCP_STATE_NOT_CONNECTED:
 -        link->data.tcp->state.out = state;
 +        if (link->data.tcp->state.in == ALIAS_TCP_STATE_CONNECTED)
 +            link->expire_time = TCP_EXPIRE_CONNECTED;
          break;
      default:
          abort();
      }
 +    link->data.tcp->state.out = state;
  }
  
  
 
 --ZGiS0Q5IWpPtfppv--
 

From: Brian Nelson <brian@pocketscience.com>
To: Ruslan Ermilov <ru@FreeBSD.org>
Cc: brian@FreeBSD.org, cmott@scientech.com, net@FreeBSD.org,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: bin/17963: NATD appears to memory leak when a connection fails from 
 the internal network to the external network.
Date: Fri, 14 Apr 2000 14:25:12 -0700

 This seems to have worked!  been running for hours, and we're still at
 ~600k.
 
 Thanks a lot for your help!  is this going into -current or -stable any
 time soon?
 
 Ruslan Ermilov wrote:
 > 
 > On Wed, Apr 12, 2000 at 07:18:39PM -0700, brian@pocketscience.com wrote:
 > >
 > [...]
 > > from an internal machine, make several network connections that get
 > > dropped on the remote end (not denied, but connection timeouts)
 > >
 > Please try the following patch.  It is for RELENG_3 (latest) sources.
 > Extract patch to the currrent directory, then follow instructions:
 > 
 > # mv ./p /tmp
 > # cd /usr/src/lib/libalias
 > # patch </tmp/p
 > # make clean all install                # build/install new library
 > # cd /usr/src/sbin/natd
 > # make clean all install                # build/install natd with new library
 > 
 > BACKGROUND
 > 
 > The problem was that the TCP link's timeout was set to TCP_EXPIRE_CONNECTED
 > (86400 secs) right after the first SYN from the client (or from the server
 > for incoming connections).  With this change, this huge timeout value will
 > only be applied to ESTABLISHED connections, i.e. only after SYN was seen
 > from both client and server side.  TCP links corresponding to failed TCP
 > connections (those which never receive neither SYN-ACK nor RST from server),
 > will be dropped after TCP_EXPIRE_INITIAL (300 seconds) timeout.
 > 
 > Cheers,
 > --
 > Ruslan Ermilov          Sysadmin and DBA of the
 > ru@ucb.crimea.ua        United Commercial Bank,
 > ru@FreeBSD.org          FreeBSD committer,
 > +380.652.247.647        Simferopol, Ukraine
 > 
 > http://www.FreeBSD.org  The Power To Serve
 > http://www.oracle.com   Enabling The Information Age
 > 
 >   ------------------------------------------------------------------------
 > 
 >    pName: p
 >     Type: Plain Text (text/plain)
 
State-Changed-From-To: open->closed 
State-Changed-By: ru 
State-Changed-When: Mon Apr 17 02:12:53 PDT 2000 
State-Changed-Why:  
Fixed in 5.0-CURRENT, 4.0-STABLE and 3.4-STABLE, file 
src/lib/libalias/alias_db.c, revisions 1.26, 1.21.2.2 
and 1.10.2.6 respectively. 
>Unformatted:
