From mandrews@bit0.com  Mon Mar 19 03:30:02 2007
Return-Path: <mandrews@bit0.com>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3A4DB16A402
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 19 Mar 2007 03:30:02 +0000 (UTC)
	(envelope-from mandrews@bit0.com)
Received: from mindcrime.bit0.com (bit0.com [207.246.88.211])
	by mx1.freebsd.org (Postfix) with ESMTP id 90D6613C44C
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 19 Mar 2007 03:30:01 +0000 (UTC)
	(envelope-from mandrews@bit0.com)
Received: by mindcrime.bit0.com (Postfix, from userid 502)
	id 5020B730003; Sun, 18 Mar 2007 23:05:10 -0400 (EDT)
Message-Id: <20070319030510.5020B730003@mindcrime.bit0.com>
Date: Sun, 18 Mar 2007 23:05:10 -0400 (EDT)
From: Mike Andrews <mandrews@bit0.com>
Reply-To: Mike Andrews <mandrews@bit0.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: net-snmp proc monitoring randomly fails
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         110498
>Category:       ports
>Synopsis:       net-snmp proc monitoring randomly fails
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kuriyama
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 19 03:40:04 GMT 2007
>Closed-Date:    Sun Mar 25 12:36:59 GMT 2007
>Last-Modified:  Sun Mar 25 12:40:15 GMT 2007
>Originator:     Mike Andrews
>Release:        FreeBSD 6.2-RELEASE-p2 amd64
>Organization:
Fark.com LLC
>Environment:
System: FreeBSD mindcrime.bit0.com 6.2-RELEASE-p2 FreeBSD 6.2-RELEASE-p2 #19: Sun Mar 4 15:16:21 EST 2007 mandrews@mindcrime.bit0.com:/usr/obj/usr/src/sys/MINDCRIME amd64


>Description:

With net-snmp 5.3.1 and FreeBSD 6.2-RELEASE (i386 or amd64) the "proc"
monitoring facility will randomly indicate alarms that certain processes
are not running (or not enough are running) when in fact they actually are.
The alarms will suddenly start with no warning and then clear themselves
up several hours later.

If you have Nagios checking these alarms, it can be highly annoying. :)

I'm fairly certain net-snmp 5.2.x and earlier don't have this problem
(I've been using them for years).

The problem is that net-snmp uses /bin/ps to get a list of processes
and writes the output of ps to /var/net-snmp/.snmp-exec-cache.  The
file is truncated at 16000 bytes.  This is way too small for systems
with many hundreds of running processes at a time.

Maybe previous versions (5.2.x and earlier) of net-snmp used something
other than /bin/ps to get the process list?  I don't have a procfs
filesystem mounted (I did try it to see if it'd help and it didn't)

>How-To-Repeat:

bourbon# grep proc /usr/local/share/snmp/snmpd.conf
proc syslogd 1 1
proc httpd
proc ntpd 1 1
proc smartd
proc clamd
proc freshclam
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# snmpwalk -v 2c -c ___ localhost .1.3.6.1.4.1.2021.2.1
UCD-SNMP-MIB::prIndex.1 = INTEGER: 1
UCD-SNMP-MIB::prIndex.2 = INTEGER: 2
UCD-SNMP-MIB::prIndex.3 = INTEGER: 3
UCD-SNMP-MIB::prIndex.4 = INTEGER: 4
UCD-SNMP-MIB::prIndex.5 = INTEGER: 5
UCD-SNMP-MIB::prIndex.6 = INTEGER: 6
UCD-SNMP-MIB::prNames.1 = STRING: syslogd
UCD-SNMP-MIB::prNames.2 = STRING: httpd
UCD-SNMP-MIB::prNames.3 = STRING: ntpd
UCD-SNMP-MIB::prNames.4 = STRING: smartd
UCD-SNMP-MIB::prNames.5 = STRING: clamd
UCD-SNMP-MIB::prNames.6 = STRING: freshclam
UCD-SNMP-MIB::prMin.1 = INTEGER: 1
UCD-SNMP-MIB::prMin.2 = INTEGER: 0
UCD-SNMP-MIB::prMin.3 = INTEGER: 1
UCD-SNMP-MIB::prMin.4 = INTEGER: 0
UCD-SNMP-MIB::prMin.5 = INTEGER: 0
UCD-SNMP-MIB::prMin.6 = INTEGER: 0
UCD-SNMP-MIB::prMax.1 = INTEGER: 1
UCD-SNMP-MIB::prMax.2 = INTEGER: 0
UCD-SNMP-MIB::prMax.3 = INTEGER: 1
UCD-SNMP-MIB::prMax.4 = INTEGER: 0
UCD-SNMP-MIB::prMax.5 = INTEGER: 0
UCD-SNMP-MIB::prMax.6 = INTEGER: 0
UCD-SNMP-MIB::prCount.1 = INTEGER: 1
UCD-SNMP-MIB::prCount.2 = INTEGER: 345
UCD-SNMP-MIB::prCount.3 = INTEGER: 1
UCD-SNMP-MIB::prCount.4 = INTEGER: 1
UCD-SNMP-MIB::prCount.5 = INTEGER: 0
UCD-SNMP-MIB::prCount.6 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.1 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.2 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.3 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.4 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.5 = INTEGER: 1
UCD-SNMP-MIB::prErrorFlag.6 = INTEGER: 1
UCD-SNMP-MIB::prErrMessage.1 = STRING:
UCD-SNMP-MIB::prErrMessage.2 = STRING:
UCD-SNMP-MIB::prErrMessage.3 = STRING:
UCD-SNMP-MIB::prErrMessage.4 = STRING:
UCD-SNMP-MIB::prErrMessage.5 = STRING: No clamd process running.
UCD-SNMP-MIB::prErrMessage.6 = STRING: No freshclam process running.
UCD-SNMP-MIB::prErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.2 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.3 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.4 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.5 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.6 = INTEGER: 0
UCD-SNMP-MIB::prErrFixCmd.1 = STRING:
UCD-SNMP-MIB::prErrFixCmd.2 = STRING:
UCD-SNMP-MIB::prErrFixCmd.3 = STRING:
UCD-SNMP-MIB::prErrFixCmd.4 = STRING:
UCD-SNMP-MIB::prErrFixCmd.5 = STRING:
UCD-SNMP-MIB::prErrFixCmd.6 = STRING:
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# ps -acx | grep httpd | wc
     744    3720   23808

(744 > 345)   ;-)

>Fix:

Try this patch, though only the second half of it seems to actually fix it:


*** acconfig.h.orig     Fri May 26 12:36:06 2006
--- acconfig.h  Sun Mar 18 22:24:27 2007
***************
*** 488,494 ****

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 488,494 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */

*** include/net-snmp/net-snmp-config.h.in.orig  Fri May 26 12:36:06 2006
--- include/net-snmp/net-snmp-config.h.in       Sun Mar 18 22:54:13 2007
***************
*** 1334,1340 ****

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 1334,1340 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-ports-bugs->kuriyama 
Responsible-Changed-By: edwin 
Responsible-Changed-When: Mon Mar 19 06:55:10 UTC 2007 
Responsible-Changed-Why:  
Over to maintainer 

http://www.freebsd.org/cgi/query-pr.cgi?pr=110498 
State-Changed-From-To: open->closed 
State-Changed-By: kuriyama 
State-Changed-When: Sun Mar 25 12:36:35 UTC 2007 
State-Changed-Why:  
Increased to 120KB as your patch.  Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=110498 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: ports/110498: commit references a PR
Date: Sun, 25 Mar 2007 12:35:54 +0000 (UTC)

 kuriyama    2007-03-25 12:35:46 UTC
 
   FreeBSD ports repository
 
   Modified files:
     net-mgmt/net-snmp    Makefile 
     net-mgmt/net-snmp/files snmpd.sh.in 
   Added files:
     net-mgmt/net-snmp/files patch-net-snmp-config.h.in 
   Log:
   - Remove "sig_stop=KILL" in snmpd.sh.in.  This was introduced when
     PR ports/63759 was committed (3 years ago).  Try to use normal TERM
     signal for graceful termination [1].
   - Increase /bin/ps cache size from 16KB to 120KB.  This should fix
     process counter (ex prCount.1) on the server which has large number
     of processes [2].
   
   PR:             ports/103811 [1], ports/110498 [2]
   Reported by:    Yuri Arabadji <yuri@deepunix.net> [1],
                   Mike Andrews <mandrews@bit0.com> [2]
   
   Revision  Changes    Path
   1.141     +1 -1      ports/net-mgmt/net-snmp/Makefile
   1.1       +11 -0     ports/net-mgmt/net-snmp/files/patch-net-snmp-config.h.in (new)
   1.6       +1 -2      ports/net-mgmt/net-snmp/files/snmpd.sh.in
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
>Unformatted:
