From nobody@FreeBSD.ORG Mon Sep 27 00:30:21 1999
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id B119115254; Mon, 27 Sep 1999 00:30:21 -0700 (PDT)
Message-Id: <19990927073021.B119115254@hub.freebsd.org>
Date: Mon, 27 Sep 1999 00:30:21 -0700 (PDT)
From: riccardo@torrini.org
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@freebsd.org
Subject: routed exit after some day of work with signal 6 (core dump)
X-Send-Pr-Version: www-1.0

>Number:         13992
>Category:       misc
>Synopsis:       routed exit after some day of work with signal 6 (core dump)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Sep 27 00:40:00 PDT 1999
>Closed-Date:    Wed Sep 29 01:52:27 PDT 1999
>Last-Modified:  Wed Sep 29 01:55:36 PDT 1999
>Originator:     Riccardo Torrini
>Release:        FreeBSD 3.3-STABLE (OCTOPUSSY) #0: Wed Sep 22 08:57:26 CEST 1999
>Organization:
ESAOTE s.p.a.
>Environment:
FreeBSD snail.fi.esaote.it 3.3-STABLE FreeBSD 3.3-STABLE #0: Wed Sep 22 08:57:26 CEST 1999     root@snail.fi.esaote.it:/usr/src/sys/compile/OCTOPUSSY  i386

>Description:
For the 3rd time routed -s exits after some day of work with signal 6
(SIGABRT) without any other message. Visible on console (and as last
line of dmesg) but not always on /var/log/messages.
The machine is up from 22.9.1999-23:42 (reboot after make world)

From dmesg (this happens at 02:22 GMT+1 this morning, 27.9.1999):
-----8<----------8<----------8<-----
CPU: i486 DX2 (486-class CPU)
  Origin = "GenuineIntel"  Id = 0x435  Stepping = 5
  Features=0x3<FPU,VME>
real memory  = 92274688 (90112K bytes)
avail memory = 86380544 (84356K bytes)
[...]
changing root device to da0s1a
pid 1147 (routed), uid 0: exited on signal 6 (core dumped)
-----8<----------8<----------8<-----

From messages:
-----8<----------8<----------8<-----
Sep 24 21:24:43 snail routed[1147]: select: Invalid argument
Sep 24 21:24:44 snail /kernel: pid 1147 (routed), uid 0: exited on signal 6 (core dumped)

>How-To-Repeat:
On my machine, an HP NetServer 4/66-LC (486/DX2-66) used as internal
router with 4 Intel EtherExpress Pro/10 on isa and internet gateway
with 56k internal usrobotics modem, it happens every some day.
I recompiled world and kernel, after cvsupping, on end of august,
begin of september, 20 and 24 of september.  No more often because
I need full 24 hours to build and install :-(
>Fix:
I have work-around with a script that poll processes and
respawn "routed -s" when it dies.  Not a real fix... :-(

>Release-Note:
>Audit-Trail:

From: Ruslan Ermilov <ru@ucb.crimea.ua>
To: riccardo@torrini.org
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: misc/13992: routed exit after some day of work with signal 6 (core dump)
Date: Mon, 27 Sep 1999 13:18:41 +0300

 On Mon, Sep 27, 1999 at 12:30:21AM -0700, riccardo@torrini.org wrote:
 > 
 > For the 3rd time routed -s exits after some day of work with signal 6
 > (SIGABRT) without any other message. Visible on console (and as last
 > line of dmesg) but not always on /var/log/messages.
 > The machine is up from 22.9.1999-23:42 (reboot after make world)
 > 
 [...]
 > >From messages:
 > -----8<----------8<----------8<-----
 > Sep 24 21:24:43 snail routed[1147]: select: Invalid argument
 > Sep 24 21:24:44 snail /kernel: pid 1147 (routed), uid 0: exited on signal 6 (core dumped)
 > 
 Could you please compile the routed(8) with debug symbols, i.e.
 
 # cd /usr/src/sbin/routed; make DEBUG_FLAGS=-g clean all
 
 And run gdb(1) against the core file with this version of routed(8)?
 
 
 Thanks,
 -- 
 Ruslan Ermilov		Sysadmin and DBA of the
 ru@ucb.crimea.ua	United Commercial Bank,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.247.647	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
 

From: Ruslan Ermilov <ru@ucb.crimea.ua>
To: Riccardo Torrini <riccardo@torrini.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: misc/13992: routed exit after some day of work with signal 6 (core dump)
Date: Tue, 28 Sep 1999 15:56:26 +0300

 --bp/iNruPH9dso1Pn
 Content-Type: text/plain; charset=us-ascii
 
 On Tue, Sep 28, 1999 at 12:52:08PM +0200, Riccardo Torrini wrote:
 > Ruslan Ermilov wrote:
 > 
 > > Great!  Send me this core file as well, and in two minutes after
 > > that you'll know what happened, I'm very close to it.
 > 
 > Here it is.
 > 
 The problem is that gettimeofday(2) returns a garbage for you, and
 the timeout value being passed to select(2) becomes invalid:
 
 : GNU gdb 4.18
 : Copyright 1998 Free Software Foundation, Inc.
 : GDB is free software, covered by the GNU General Public License, and you are
 : welcome to change it and/or distribute copies of it under certain conditions.
 : Type "show copying" to see the conditions.
 : There is absolutely no warranty for GDB.  Type "show warranty" for details.
 : This GDB was configured as "i386-unknown-freebsd"...
 : Core was generated by `routed'.
 : Program terminated with signal 6, Abort trap.
 : #0  0x806d114 in kill ()
 : (gdb) where
 : #0  0x806d114 in kill ()
 : #1  0x806c608 in abort ()
 : #2  0x804c208 in logbad (dump=1, p=0x80718c7 "select: %s")
 :     at /usr/src/sbin/routed/main.c:901
 : #3  0x804b90b in main (argc=0, argv=0xbfbfdbfc)
 :     at /usr/src/sbin/routed/main.c:468
 : #4  0x80480e9 in _start ()
 : (gdb) up 3
 : #3  0x804b90b in main (argc=0, argv=0xbfbfdbfc)
 :     at /usr/src/sbin/routed/main.c:468
 : 468					BADERR(1,"select");
 : (gdb) list
 : 463			trace_flush();
 : 464			ibits = fdbits;
 : 465			n = select(sock_max, &ibits, 0, 0, &wtime);
 : 466			if (n <= 0) {
 : 467				if (n < 0 && errno != EINTR && errno != EAGAIN)
 : 468					BADERR(1,"select");
 : 469				continue;
 : 470			}
 : 471	
 : 472			if (FD_ISSET(rt_sock, &ibits)) {
 : (gdb) print sock_max
 : $1 = 6
 : (gdb) print wtime
 : $2 = {tv_sec = 3, tv_usec = 695150852}
                               ^^^^^^^^^
 			      that's why select(2) returned EINVAL
 : (gdb) print ifinit_timer
 : $3 = {tv_sec = 184988, tv_usec = 841300}
 : (gdb) print now
 : $4 = {tv_sec = 184985, tv_usec = -694309552}
                                    ^^^^^^^^^^
 				   what's up?
 : (gdb) print epoch
 : $5 = {tv_sec = 938326603, tv_usec = 194765}
 : (gdb) print clk
 : $6 = {tv_sec = 938511589, tv_usec = -695114787}
                                       ^^^^^^^^^^
 				      bah, gettimeofday(2) failed!
 : (gdb) print prev_clk
 : $7 = {tv_sec = 938511586, tv_usec = 900334}
 : (gdb) quit
 
 
 Could you please compile and run an attached test program?
 Let it run until it finishes.  If it finishes, it will print an
 incorrect date returned by gettimeofday().
 
 Then please send me the output of this test (if any), as well as
 the output of the following commands:
 
 # cat /var/run/dmesg.boot
 # sysctl kern.timecounter.method machdep.tsc_freq
 
 
 Cheers,
 -- 
 Ruslan Ermilov		Sysadmin and DBA of the
 ru@ucb.crimea.ua	United Commercial Bank,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.247.647	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age
 
 --bp/iNruPH9dso1Pn
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="gettimeofday_test.c"
 
 #include <sys/time.h>
 #include <err.h>
 #include <stdio.h>
 
 void
 main()
 {
     struct timeval tp;
 
     do {
 	if (gettimeofday(&tp, (struct timezone *)NULL) == -1) {
 	    err(1, "gettimeofday");
 	}
 
 	if (tp.tv_usec < 0) {
 	    errx(1, "invalid time returned: %ld:%ld",
 		tp.tv_sec, tp.tv_usec);
 	}
     } while (1);
 
 }
 
 --bp/iNruPH9dso1Pn--
 

From: Riccardo Torrini <riccardo@torrini.org>
To: Ruslan Ermilov <ru@ucb.crimea.ua>
Cc:  
Subject: Re: misc/13992: routed exit after some day of work with signal 6 (core 
 dump)
Date: Tue, 28 Sep 1999 16:20:43 +0200

 This is a multi-part message in MIME format.
 --------------225CBA2E802707D70362173B
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit
 
 Ruslan Ermilov wrote:
 
 > Let it run until it finishes.  If it finishes, it will print an
 > incorrect date returned by gettimeofday().
 
 After about 20minutes:
 gettimeofday_test: invalid time returned: 938527821:-695331771
 
 
 # sysctl kern.timecounter.method machdep.tsc_freq
 kern.timecounter.method: 0
 
 
 # sysctl -a | grep -i machdep
 machdep.consdev: { major = 0, minor = 0 }
 machdep.adjkerntz: -7200
 machdep.disable_rtc_set: 0
 machdep.wall_cmos_clock: 1
 machdep.do_dump: 1
 machdep.ispc98: 0
 machdep.msgbuf: 
 machdep.msgbuf_clear: 0
 machdep.i8254_freq: 1193182
 machdep.conspeed: 9600
 
 
 # sysctl -a | grep -i tsc
 
 
 # sysctl -a | grep -i freq
 kern.acct_chkfreq: 15
 machdep.i8254_freq: 1193182
 
 
 Sorry, no machdep.tsc_freq (but a sound-like machdep.i8254_freq)
 If you are sure of spelling I am missing something :-(
 
 
 Ciao++
 Vic.
 /------------------------+---------------------------------------\
 | Riccardo "VIC" Torrini | W.W.W.: www.torrini.org            // |
 |   Via Montebello, 64   | e-mail : riccardo@torrini.org     //  |
 |   50123 Firenze  (I)   +--------------------------------\\//---|
 | phone: +39-055-286.574 |        This space for rent :-)        |
 \------------------------+---------------------------------------/
 --------------225CBA2E802707D70362173B
 Content-Type: text/plain; charset=us-ascii;
  name="dmesg.boot"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="dmesg.boot"
 
 Copyright (c) 1992-1999 FreeBSD Inc.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
 	The Regents of the University of California. All rights reserved.
 FreeBSD 3.3-STABLE #0: Wed Sep 22 08:57:26 CEST 1999
     root@snail.fi.esaote.it:/usr/src/sys/compile/OCTOPUSSY
 Timecounter "i8254"  frequency 1193182 Hz
 CPU: i486 DX2 (486-class CPU)
   Origin = "GenuineIntel"  Id = 0x435  Stepping = 5
   Features=0x3<FPU,VME>
 real memory  = 92274688 (90112K bytes)
 avail memory = 86380544 (84356K bytes)
 Preloaded elf kernel "kernel" at 0xc0318000.
 Preloaded splash_image_data "/boot/daemon.bmp" at 0xc031809c.
 Preloaded elf module "splash_bmp.ko" at 0xc03180ec.
 eisa0: <HWPc081 (System Board)>
 Probing for devices on the EISA bus
 ahc0: <Adaptec aic7770 SCSI host adapter> at 0xbc00-0xbcff irq 15
 ahc0: on eisa0 slot 11
 ahc0: aic7770 >= Rev E, Twin Channel, A SCSI Id=7, B SCSI Id=7, primary A, 4/255 SCBs
 Probing for devices on the ISA bus:
 sc0 on isa
 sc0: VGA color <16 virtual consoles, flags=0x0>
 atkbdc0 at 0x60-0x6f on motherboard
 atkbd0 irq 1 on isa
 psm0 not found
 sio2 at 0x3e8-0x3ef irq 4 on isa
 sio2: type 16550A
 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
 fdc0: FIFO enabled, 8 bytes threshold
 fd0: 1.44MB 3.5in
 wdc0 at 0x1f0-0x1f7 irq 14 on isa
 wdc0: unit 1 (atapi): <FX001DE/K01>, removable, intr, iordis
 acd0: drive speed 689KB/sec, 128KB cache
 acd0: supported read types:
 acd0: Audio: play, 255 volume levels
 acd0: Mechanism: ejectable tray
 acd0: Medium: no/blank disc inside, unlocked
 ppc0 at 0x378 irq 7 on isa
 ppc0: PC87332 chipset (NIBBLE-only) in COMPATIBLE mode
 ex0 at 0x2a0-0x2af irq 10 on isa
 ex0: Intel EtherExpress Pro/10, address 00:aa:00:ad:61:fd, connector TPE
 ex1 at 0x2b0-0x2bf irq 11 on isa
 ex1: Intel EtherExpress Pro/10, address 00:aa:00:ad:64:f0, connector TPE
 ex2 at 0x2c0-0x2cf irq 3 on isa
 ex2: Intel EtherExpress Pro/10, address 00:aa:00:ae:5c:e7, connector TPE
 ex3 at 0x2d0-0x2df irq 5 on isa
 ex3: Intel EtherExpress Pro/10, address 00:aa:00:ad:63:8e, connector TPE
 vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
 npx0 on motherboard
 npx0: INT 16 interface
 ccd0-3: Concatenated disk drivers
 IP packet filtering initialized, divert enabled, rule-based forwarding enabled, logging disabled
 DUMMYNET initialized (990504)
 IP Filter: initialized.  Default = pass all, Logging = enabled
 Waiting 15 seconds for SCSI devices to settle
 da0 at ahc0 bus 0 target 0 lun 0
 da0: <IBM OEM 0662S12 3 30> Fixed Direct Access SCSI-2 device 
 da0: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled
 da0: 1003MB (2055035 512 byte sectors: 64H 32S/T 1003C)
 da1 at ahc0 bus 0 target 1 lun 0
 da1: <IBM OEM 0662S12 1011> Fixed Direct Access SCSI-2 device 
 da1: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled
 da1: 1003MB (2055035 512 byte sectors: 64H 32S/T 1003C)
 da3 at ahc0 bus 0 target 3 lun 0
 da3: <SEAGATE ST31230N 0594> Fixed Direct Access SCSI-2 device 
 da3: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled
 da3: 1010MB (2069860 512 byte sectors: 64H 32S/T 1010C)
 da2 at ahc0 bus 0 target 2 lun 0
 da2: <HP 1.050 GB #A2 0180> Fixed Direct Access SCSI-2 device 
 da2: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled
 da2: 1001MB (2051460 512 byte sectors: 64H 32S/T 1001C)
 changing root device to da0s1a
 
 --------------225CBA2E802707D70362173B--
 
 
State-Changed-From-To: open->closed 
State-Changed-By: ru 
State-Changed-When: Wed Sep 29 01:52:27 PDT 1999 
State-Changed-Why:  
Superseded by PR kern/14034. 
>Unformatted:
 
