From tshansen@amp.nlanr.net  Thu Mar  2 12:41:08 2000
Return-Path: <tshansen@amp.nlanr.net>
Received: from amp.nlanr.net (nlanr-amp.nws.orst.edu [128.193.128.26])
	by hub.freebsd.org (Postfix) with ESMTP id 803D737BD28
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  2 Mar 2000 12:41:02 -0800 (PST)
	(envelope-from tshansen@amp.nlanr.net)
Received: (from tshansen@localhost)
	by amp.nlanr.net (8.9.1/8.9.1) id MAA08259;
	Thu, 2 Mar 2000 12:40:45 -0800 (PST)
	(envelope-from tshansen)
Message-Id: <200003022040.MAA08259@amp.nlanr.net>
Date: Thu, 2 Mar 2000 12:40:45 -0800 (PST)
From: Todd Hansen <tshansen@amp.nlanr.net>
Reply-To: tshansen@nlanr.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: problem with cron forgetting jobs
X-Send-Pr-Version: 3.2

>Number:         17134
>Category:       bin
>Synopsis:       problem with 3.0-RELEASE cron forgetting jobs
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar  2 12:50:02 PST 2000
>Closed-Date:    Wed Jan 30 01:20:33 PST 2002
>Last-Modified:  Wed Jan 30 01:20:34 PST 2002
>Originator:     Todd Hansen
>Release:        FreeBSD 3.0-RELEASE i386
>Organization:
National Laboratory for Applied Network Research (NLANR)
>Environment:

host info:

> uname -a
FreeBSD amp.nlanr.net 3.2-RELEASE FreeBSD 3.2-RELEASE #4: Wed Jul 28 21:22:56 PDT 1999     hwb@amp.nlanr.net:/usr/src/sys/compile/NAI-AMP  i386
> 

crontab entrys affected:

joule.nlanr.net.actmon> crontab -l
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/crontab.eTKvHXl788 installed on Fri Dec 31 11:14:06 1999)
# (Cron version -- $Id: crontab.c,v 1.11 1997/09/15 06:39:15 charnier Exp $)
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/crontab.ZOyLKT3056 installed on Tue Oct 27 16:51:55 1998)
# (Cron version -- $Id: crontab.c,v 1.6.2.3 1998/03/09 11:42:00 jkh Exp $)
* * * * *       cd $HOME/src/pinger/vBNS ; sleep `jot -r 1 1 15` ; ./docollector
0,10,20,30,40,50 * * * *      cd $HOME/src/pinger/vBNS ; sleep `jot -r 1 1 15` ; nice ./dogentrace ; cd $HOME/src/pinger/ ; ./watchdog -ko -t 600 -w watchdog.file "./am_master -n 10 -w watchdog.file amp volt &"
10 2 * * * find $HOME/src/pinger/vBNS/data -type f -mtime +5 -exec rm {} \;
joule.nlanr.net.actmon>


>Description:

We run a distributed system of currently 102 active measurement probes around the internet (all running freeebsd 3.0).
Basically we are noticing that periodically (almost regularly) the cron daemon will forget about some of our jobs, even
though it lists them with the crontab -l command. This happens on about 10 systems in about 2 months. 

Anyway, the problem is related to what was mentioned in bin/6004. Except we have more information and a greater need to work with you to 
figure this out. Unfortunatly we are still running 3.0 until we can figure out if this is fixed in 3.4 since it is a big deal to upgrade 102 sites remotely.

Eventually when cron forgets about a job, it still trys to execute the job, but instead of 
actually, executing the job we see something like this in the log: 

Mar  2 12:20:00 nai-a-odun /USR/SBIN/CRON[6248]: (actmon) CMD ()

Where the cmd is blank but the command is run at the correct time. The interesting thing is 
other commands are still run fine while this command is not. The line that is affected the most
by this problem is the line in the above crontab that runs ./dogentrace every 10 minutes.

thanks.
Todd


>How-To-Repeat:

It seems to repeat within a reasonable amount of time on our systems, probably  becuase we have so many.

>Fix:
	
We would love one, if it can be found.


>Release-Note:
>Audit-Trail:

From: Gregory Bond <gnb@itga.com.au>
To: tshansen@nlanr.net
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: bin/17134: problem with cron forgetting jobs 
Date: Fri, 03 Mar 2000 10:03:36 +1100

 I have seen similar behaviour as well on various versions up to 3.4-STABLE.  In
 particular, if you are testing cron jobs and repeatedly putting crontab entries
 in for say 5 minutes in advance of the current time, sometimes these jobs don't
 run (but the cron log entry is generated as mentioned in the PR).
 
 I've also seen this happen on cron on Solaris 2.6, btw, which is also _very_
 broken with the handling of quotes, escaped % characters. There is no way under
 Solaris cron to get the literal string '%' (i.e. squote pct squote) into a
 command.  Unescaped % is a newline (as per the manual), '\%' is passed as '\%'.
  This does work as expected on FreeBSD.
 
 
 
 
 
 

From: Sheldon Hearn <sheldonh@uunet.co.za>
To: tshansen@nlanr.net
Cc: FreeBSD-gnats-submit@FreeBSD.ORG, ghelmer@FreeBSD.org
Subject: Re: bin/17134: problem with cron forgetting jobs 
Date: Fri, 03 Mar 2000 12:18:14 +0200

 On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:
 
 > Anyway, the problem is related to what was mentioned in
 > bin/6004. Except we have more information and a greater need to work
 > with you to figure this out.
 
 Guy Helmer closed that PR because he couldn't get any more information
 from the originator.  I'm copying him on this mail in the hopes that
 he was actually interested in that PR. :-)
 
 Ciao,
 Sheldon.
 

From: Guy Helmer <ghelmer@cs.iastate.edu>
To: Sheldon Hearn <sheldonh@uunet.co.za>
Cc: tshansen@nlanr.net, FreeBSD-gnats-submit@FreeBSD.ORG,
	Guy Helmer <ghelmer@cs.iastate.edu>
Subject: Re: bin/17134: problem with cron forgetting jobs 
Date: Fri, 3 Mar 2000 10:41:45 -0600 (CST)

 On Fri, 3 Mar 2000, Sheldon Hearn wrote:
 
 > On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:
 > 
 > > Anyway, the problem is related to what was mentioned in
 > > bin/6004. Except we have more information and a greater need to work
 > > with you to figure this out.
 > 
 > Guy Helmer closed that PR because he couldn't get any more information
 > from the originator.  I'm copying him on this mail in the hopes that
 > he was actually interested in that PR. :-)
 
 I closed 6004 since I was neither able to verify that it was still a
 problem nor obtain further clues.  I am suprised that more people have not
 encountered this problem if it is simply exhibited with
 frequently-executed jobs.  I will try running cron with some debugging
 options enabled (-x proc and maybe some others), and see if I can
 duplicate this; if anyone else wants to do so also, that's fine :-)
 
 It may be helpful to obtain a core dump and executable image from from a
 cron daemon built with "cd /usr/src/usr.sbin/cron && make CFLAGS=-g
 LDFLAGS=-static clean all" -- then run cron and "kill -6" it after it has
 started to exhibit this behavior. The bug's cause could be a stray pointer
 or an off-by-one error, but seeing what was in the data structures may
 help.
 
 Guy
 
 
 

From: Todd Hansen <tshansen@oceana.nlanr.net>
To: Guy Helmer <ghelmer@cs.iastate.edu>
Cc: Sheldon Hearn <sheldonh@uunet.co.za>,
	FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: bin/17134: problem with cron forgetting jobs 
Date: Fri, 3 Mar 2000 11:56:05 -0800 (PST)

 I will see what I can do with our systems to help you. Maybe I can run the
 tests you mentioned below. At the very least I can get you a core from a
 3.0-RELEASE cron binary.
 	-todd
 
 On Fri, 3 Mar 2000, Guy Helmer wrote:
 
 > On Fri, 3 Mar 2000, Sheldon Hearn wrote:
 > 
 > > On Thu, 02 Mar 2000 12:40:45 PST, Todd Hansen wrote:
 > > 
 > > > Anyway, the problem is related to what was mentioned in
 > > > bin/6004. Except we have more information and a greater need to work
 > > > with you to figure this out.
 > > 
 > > Guy Helmer closed that PR because he couldn't get any more information
 > > from the originator.  I'm copying him on this mail in the hopes that
 > > he was actually interested in that PR. :-)
 > 
 > I closed 6004 since I was neither able to verify that it was still a
 > problem nor obtain further clues.  I am suprised that more people have not
 > encountered this problem if it is simply exhibited with
 > frequently-executed jobs.  I will try running cron with some debugging
 > options enabled (-x proc and maybe some others), and see if I can
 > duplicate this; if anyone else wants to do so also, that's fine :-)
 > 
 > It may be helpful to obtain a core dump and executable image from from a
 > cron daemon built with "cd /usr/src/usr.sbin/cron && make CFLAGS=-g
 > LDFLAGS=-static clean all" -- then run cron and "kill -6" it after it has
 > started to exhibit this behavior. The bug's cause could be a stray pointer
 > or an off-by-one error, but seeing what was in the data structures may
 > help.
 > 
 > Guy
 > 
 > 
 > 
 
 
State-Changed-From-To: open->feedback 
State-Changed-By: mike 
State-Changed-When: Sat Jul 21 17:12:34 PDT 2001 
State-Changed-Why:  

Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=17134 

From: Mike Barcroft <mike@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
Date: Sun, 26 Aug 2001 01:58:44 -0400

 Adding to Audit-Trail.
 
 ----- Forwarded message from Todd Hansen <tshansen@nlanr.net> -----
 
 Delivered-To: mike@freebsd.org
 X-Authentication-Warning: mave.nlanr.net: tshansen owned process doing -bs
 Date: Mon, 6 Aug 2001 08:59:01 -0700 (PDT)
 From: Todd Hansen <tshansen@nlanr.net>
 To: mike@FreeBSD.org
 Cc: freebsd-bugs@FreeBSD.org, tonym@nlanr.net
 Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
 In-Reply-To: <200107220012.f6M0Cmg21052@freefall.freebsd.org>
 
 I have forgotten how far we tested, maybe tony knows. I think we tested
 with 4.0 cron? I know we stopped playing with the bug well before 4.3 was
 released. We were never able to find a version which did not exhibit the
 bug. However, the problem was very rare and we can only see it because of
 the number of machines we are running. I don't know if we have seen the
 bug recently though.
 	-todd
 
 
 ----- End forwarded message -----

From: Mike Barcroft <mike@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
Date: Sun, 26 Aug 2001 02:00:36 -0400

 Adding to Audit-Trail.
 
 ----- Forwarded message from Tony McGregor <tonym@cs.waikato.ac.nz> -----
 
 Delivered-To: mike@freebsd.org
 Date: Fri, 24 Aug 2001 10:19:44 +1200 (NZST)
 From: Tony McGregor <tonym@cs.waikato.ac.nz>
 To: Todd Hansen <tshansen@nlanr.net>
 Cc: mike@FreeBSD.org, freebsd-bugs@FreeBSD.org
 Subject: Re: bin/17134: problem with 3.0-RELEASE cron forgetting jobs
 In-Reply-To: <Pine.BSF.4.21.0108060856300.93879-100000@mave.nlanr.net>
 
 
 We haven't seen it since I added code to limit the number of concurrent
 traceroutes.  That adds weight to the theory that the bug occurred when
 the system ran out of memory.
 
 I haven't tested recent version of FreeBSD.
 
 On Mon, 6 Aug 2001, Todd Hansen wrote:
 
 > I have forgotten how far we tested, maybe tony knows. I think we tested
 > with 4.0 cron? I know we stopped playing with the bug well before 4.3 was
 > released. We were never able to find a version which did not exhibit the
 > bug. However, the problem was very rare and we can only see it because of
 > the number of machines we are running. I don't know if we have seen the
 > bug recently though.
 > 	-todd
 
 ----------------------------------------------------------------------------
 Tony McGregor                   Mail:   T.McGregor@cs.waikato.ac.nz 
 Department of Computer Science  Phone:  +64 7 838 4651 
 Waikato University              Fax:    +64 7 858 5095       
 Private Bag 3105                Home:   +64 7 825 5040 mobile: (021)313004
 Hamilton, New Zealand           www:    http://www.cs.waikato.ac.nz/~tonym
 ----------------------------------------------------------------------------
 
 
 
 ----- End forwarded message -----
State-Changed-From-To: feedback->closed 
State-Changed-By: sheldonh 
State-Changed-When: Wed Jan 30 01:20:33 PST 2002 
State-Changed-Why:  
Automatic feedback timeout.  This PR remained unchanged in the feedback 
state for more than 4 months. 

If additional feedback that warrants the re-opening of this PR is 
available but not included in the audit trail, please include the 
feedback in a reply to this message (preserving the Subject line) and 
ask that the PR be re-opened. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=17134 
>Unformatted:
