From chris@borderware.com  Tue Oct  5 21:50:27 2004
Return-Path: <chris@borderware.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D6BFB16A4CF
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  5 Oct 2004 21:50:27 +0000 (GMT)
Received: from mail.borderware.com (mail.borderware.com [207.236.65.231])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5BC1A43D49
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  5 Oct 2004 21:50:23 +0000 (GMT)
	(envelope-from chris@borderware.com)
Message-Id: <20041005215018.931B3ABB5@santana.borderware.com>
Date: Tue,  5 Oct 2004 17:50:18 -0400 (EDT)
From: Chris Gabe <chris@borderware.com>
Reply-To: Chris Gabe <chris@borderware.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: syslog overflow fix
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         72366
>Category:       bin
>Synopsis:       [patch] syslog overflow fix
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    glebius
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 05 22:00:48 GMT 2004
>Closed-Date:    Sun Dec 05 19:05:39 GMT 2004
>Last-Modified:  Sun Dec 05 19:05:39 GMT 2004
>Originator:     Chris Gabe
>Release:        FreeBSD 5.3 (or later)
>Organization:
Borderware Technologies Inc
>Environment:
System: FreeBSD 5.3

	
>Description:
	When data is sent rapidly, locally, using syslog, some lines get quietly lost.
>How-To-Repeat:
	Create a program that just syslogs argv[2] in a loop of (atoi(argv[1])) count to /var/log/messages
	Run a few copies with arguments 10000000 aaaaaaaaaaaaaaaaaa so syslog is extremely busy.
	Then invoke with arguments 1000 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
	Count how many lines get to the file, using grep bbbb messages | wc.  They won't all get there every time.
>Fix:

	OpenBSD 3.2 had a fix in two parts.  The main one is below, in libc, lib/libc/gen/syslog.c.
	Beware that it's the second send() attempt that is changed, not the one just like
    it a few lines above:

    if (send(LogFile, tbuf, cnt, 0) >= 0)
        return;
 
    /*
     * If the send() failed, the odds are syslogd was restarted.
     * Make one (only) attempt to reconnect to /dev/log.
     */
    disconnectlog(); 
    connectlog(); 

<       if (send(LogFile, tbuf, cnt, 0) >= 0)
<               return;
---
>       do {    
>               usleep(1); 
>               if (send(LogFile, tbuf, cnt, 0) >= 0)
>                       return;
>       } while (errno == ENOBUFS);

Unless syslogd is really being misused, this results in no discernable performance
difference, in our experience.

The second part is not mandatory but often avoids sleeping in the above code.
It is in syslogd, usr.sbin/syslogd/syslogd.c main(),

<       socklen_t len;
---
>       socklen_t len, slen;
479a480,487 (beware these numbers are quite different in 5.3)
    for (i = 0; i < nfunix; i++) {
        (void)unlink(funixn[i]);
        memset(&sunx, 0, sizeof(sunx));
        sunx.sun_family = AF_UNIX;
        (void)strlcpy(sunx.sun_path, funixn[i], sizeof(sunx.sun_path));
        funix[i] = socket(AF_UNIX, SOCK_DGRAM, 0);
        if (funix[i] < 0 ||
            bind(funix[i], (struct sockaddr *)&sunx,
             SUN_LEN(&sunx)) < 0 ||
            chmod(funixn[i], 0666) < 0) {
            (void)snprintf(line, sizeof line,
                    "cannot create %s", funixn[i]);
            logerror(line);
            dprintf("cannot create %s (%d)\n", funixn[i], errno);
            if (i == 0)
                die(0);
        }
>       if (getsockopt(funix[i], SOL_SOCKET, SO_RCVBUF, &len,
>               &slen) == 0) {
>           len *= 2;
>           (void)setsockopt(funix[i], SOL_SOCKET, SO_RCVBUF, &len,
>                   slen);
>       }


>Release-Note:
>Audit-Trail:

From: Chris Gabe <chris@borderware.com>
To: freebsd-gnats-submit@FreeBSD.org,
	Chris Gabe <chris@borderware.com>
Cc:  
Subject: Re: bin/72366: syslog overflow fix
Date: Wed, 6 Oct 2004 04:46:29 -0400

 In my original description of how to reproduce, I didn't mention that 
 the syslog lines should include a loop counter to avoid the telescoping 
 of repeated lines.  Here is a test program:
 
 #include <sys/syslog.h>
 
 main(argc, argv)
 int argc;
 char **argv;
 {
 char *name;
 int i, maxi;
 
          maxi = atoi(argv[1]);
          name = (char *)strdup(argv[2]);
          for (i=0; i<maxi; i++) {
                  syslog(LOG_INFO,"%s %d",name,i);
          }
 }
 
 
 Invoke something like
 	test 1000000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &
 	test 1000000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &
 	test 1000000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &
 	test 1000 bbbbbbbbbbbbbbbbbbbbbbb
 	killall test
 	grep bbbbb /var/log/message | wc
 
State-Changed-From-To: open->patched 
State-Changed-By: glebius 
State-Changed-When: Fri Oct 8 21:12:28 GMT 2004 
State-Changed-Why:  
Patch (partly) applied. 


Responsible-Changed-From-To: freebsd-bugs->glebius 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Fri Oct 8 21:12:28 GMT 2004 
Responsible-Changed-Why:  
I'm working on it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=72366 
State-Changed-From-To: patched->closed 
State-Changed-By: glebius 
State-Changed-When: Sun Dec 5 19:04:54 GMT 2004 
State-Changed-Why:  
Changes MFCed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=72366 
>Unformatted:
