From nemesis!uhclem  Tue Dec 10 19:29:25 1996
Received: from nemesis.lonestar.org (fw19-26.ppp.iadfw.net [207.136.16.59])
          by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id TAA04029
          for <FreeBSD-gnats-submit>; Tue, 10 Dec 1996 19:29:22 -0800 (PST)
Received: by nemesis.lonestar.org (Smail3.1.27.1 #22)
	id m0vXfG1-000uApC; Tue, 10 Dec 96 21:22 CST
Message-Id: <m0vXfG1-000uApC@nemesis.lonestar.org>
Date: Tue, 10 Dec 96 21:22 CST
From: uhclem@nemesis.lonestar.org
Reply-To: uhclem@nemesis.lonestar.org
To: FreeBSD-gnats-submit@freebsd.org
Cc: uhclem@nemesis.lonestar.org
Subject: syslogd stops logging after several hours of load - FDIV048
X-Send-Pr-Version: 3.2

>Number:         2191
>Category:       bin
>Synopsis:       syslogd stops logging after several hours of load - FDIV048
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 10 19:30:02 PST 1996
>Closed-Date:    Sat May 30 15:45:29 PDT 1998
>Last-Modified:  Sat May 30 15:47:04 PDT 1998
>Originator:     Frank Durda IV (uhclem@nemesis.lonestar.org)
>Release:        FreeBSD 2.2-ALPHA i386
>Organization:
>Environment:

[FDIV048]

Pentium 100 system, 32Meg, installed 2.2-ALPHA, with several accounts,
receiving the syslog output of 100 Livingtson terminal servers,
five FreeBSD boxes, and five Digital Alpha systems running OSF.

>Description:

With login, logout and other traffic you would expect from an ISP
with ~40,000 customers (about ten syslog messages/sec), after five to
six hours, syslogd stops writing syslog messages to disk.  The syslogd
process continues to accumulate CPU time and shows to be in run state
from time to time, but there is no output, either to the terminals
logged-in as root, or to disk.

If syslogd is killed and restarted, logging resumes for five to six hours
and then stops again.

FYI, Syslogd output was not directed to unused VTYs as in the other
recently reported syslogd problem.

This same platform with identical load had been running 2.1.5 (and
2.1.0 before that) for months and did not experience this problem with
syslogd.  After two days, we reverted the entire platform back to 2.1.5.
(I had warned management against putting 2.2-Alpha in a production
 environment, but they did it anyway because they wanted to write the
 logs on write-once CDs from time to time. )

>How-To-Repeat:

I tried to come up with a simplified way to reproduce this.  I tried
flooding syslogd with messages generated semi-randomly by using several
console logins with scripts that tried rlogins for accounts that didn't
exist.  The script was:
		while
		do
		echo "logout" | rlogin -l foo1 skaro
		echo "logout" | rlogin -l bar1 skaro
		echo "logout" | rlogin -l foo2 skaro
		echo "logout" | rlogin -l bar2 skaro
		echo "logout" | rlogin -l foo3 skaro
		echo "logout" | rlogin -l bar3 skaro
		done
running on four screens, three copies per screen running in background.

Despite letting this run for 24 hours, syslogd continued to log
as it should.  However, I did notice some unusual activity in the
syslogd process.   After booting the system and before starting to beat
on syslogd, VSZ==196 and RSS==496.

Once the load started being placed on syslogd, these values changed
over time:
	VSZ	RSS
	196	496	(Starting values)
	196	424	(within ten minutes)
	196	412	(30 minutes later)
	196	368	(after eight hours)
	196	360	(after four hours)
	196	356	(after four hours)
At this point, the load was removed.  After ten minutes,
the values went to:
	196	360
and then
	196	376

I have no idea if the changes in RSS shows anything, but it seemed odd
that the values should get *smaller* as time went by, as the load was
pretty much constant in the test.   It seems it should get larger
or stay the same size.

>Fix:
	
It is possible that the other problems recently reported with syslogd are
related, but syslogd output is not being directed to an unused console
in this case.

>Release-Note:
>Audit-Trail:

From: J Wunsch <j@uriah.heep.sax.de>
To: uhclem@nemesis.lonestar.org
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: bin/2191: syslogd stops logging after several hours of load - FDIV048
Date: Wed, 11 Dec 1996 09:54:09 +0100 (MET)

 As uhclem@nemesis.lonestar.org wrote:
 
 > (I had warned management against putting 2.2-Alpha in a production
 >  environment, but they did it anyway because they wanted to write the
 >  logs on write-once CDs from time to time. )
 
 (You could still use the 2.2 machine just for the CD-R part.  You
 gotta run mkisofs on the logs anyway, so this can be done across an
 NFS mount.)
 
 > >How-To-Repeat:
 > 
 > I tried to come up with a simplified way to reproduce this.  I tried
 
 > Despite letting this run for 24 hours, syslogd continued to log
 > as it should.
 
 Well, then chances are pretty low that anybody else than you will ever
 be able to reproduce and debug it.  So unless it will be fixed
 incidentally as a side-effect of another fix, the bug will remain
 forever. :-(
 
 Is there any chance that you run a debugger on syslogd once it stops
 writing?  (Remember, you can use the `attach' command in gdb to attach
 to a running process.)
 
 -- 
 cheers, J"org
 
 joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
 Never trust an operating system you don't have sources for. ;-)

From: Sakari Jalovaara <sja@tekla.fi>
To: FreeBSD-gnats-submit@freebsd.org
Cc:  Subject: Re: bin/2191: syslogd stops logging after several hours of load - FDIV048
Date: Thu, 7 May 1998 15:11:55 +0300

 >Synopsis: syslogd stops logging after several hours of load - FDIV048
 
 One possibility for syslogd's erratic behavior is that it does
 complicated things in signal handlers.
 
 Seems to me it can end up recursively calling functions that break due
 to static variables or getting mixed up with linked list handling (ouch).
 
 bin/5548 seems related, maybe other syslogd-related ones too (bin/6216?)
 
 									++sja
State-Changed-From-To: open->closed 
State-Changed-By: steve 
State-Changed-When: Sat May 30 15:45:29 PDT 1998 
State-Changed-Why:  
PR #5548 seems to be a description of the same problem. 
Anyone with plenty of free time on their hands want to 
look into cleaning up syslogd's signal handling routines? 
>Unformatted:
