From newton@cleese.apana.org.au  Tue Jan 24 23:59:42 1995
Received: from cleese.apana.org.au (root@cleese.apana.org.au [192.203.213.4]) by freefall.cdrom.com (8.6.9/8.6.6) with SMTP id XAA04464 for <FreeBSD-gnats-submit@freebsd.org>; Tue, 24 Jan 1995 23:59:20 -0800
Received: by cleese.apana.org.au id AA06569
  (5.67a/IDA-1.5 for FreeBSD-gnats-submit@freebsd.org); Wed, 25 Jan 1995 18:29:43 +1030
Message-Id: <199501250759.AA06569@cleese.apana.org.au>
Date: Wed, 25 Jan 1995 18:29:43 +1030
From: newton@cleese.apana.org.au
Reply-To: newton@cleese.apana.org.au
To: FreeBSD-gnats-submit@freebsd.org
Subject: Kernel bugs in 2.0-RELEASE
X-Send-Pr-Version: 3.2

>Number:         185
>Category:       kern
>Synopsis:       kernel stability problems - can't sustain uptimes > 2 days
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:
>Keywords:
>Date-Required:
>Class:          support
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jan 25 00:00:00 1995
>Closed-Date:    Sat Jan 6 10:19:32 PST 1996
>Last-Modified:  Sat Jan  6 10:22:52 PST 1996
>Originator:     Mark Newton
>Release:        FreeBSD 2.0-RELEASE i386
>Organization:
cleese.apana.org.au public access UNIX
>Environment:

Hmm - I guess the best description of the environment here can be
given by dmesg.  Summary:  i486DX2/66, 16Mb RAM, ISA-bus, FreeBSD
2.0-RELEASE, 10 serial ports running modems and terminals at speeds
up to and including 38.4kbps, 2 IDE disks, 3 SCSI disks, NE2000 ethernet.
The SCSI controller is an Ultrastor 34F with the latest revision firmware.

newton@cleese> dmesg
FreeBSD 2.0-RELEASE #0: Thu Jan 19 19:40:21 CST 1995
    root@cleese.apana.org.au:/usr/src/sys/compile/CLEESE
CPU: i486DX (486-class CPU)  Id = 0x435  Origin = "GenuineIntel"
real memory  = 16384000 (4000 pages)
avail memory = 15093760 (3685 pages)
using 290 buffers containing 2379776 bytes of memory
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <4 virtual consoles>
ed0 at 0x300-0x31f irq 10 on isa
ed0: address 00:00:21:40:97:73, type NE2000 (16 bit) 
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16450
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16450
sio2 at 0x280-0x287 irq 5 flags 0x201 on isa
sio2: type 16550A (multiport master)
sio3 at 0x288-0x28f flags 0x201 on isa
sio3: type 16550A (multiport)
sio4 at 0x290-0x297 flags 0x201 on isa
sio4: type 16550A (multiport)
sio5 at 0x298-0x29f flags 0x201 on isa
sio5: type 16550A (multiport)
sio6 at 0x2a0-0x2a7 flags 0x201 on isa
sio6: type 16550A (multiport)
sio7 at 0x2a8-0x2af flags 0x201 on isa
sio7: type 16550A (multiport)
sio8 at 0x2b0-0x2b7 flags 0x201 on isa
sio8: type 16550A (multiport)
sio9 at 0x2b8-0x2bf flags 0x201 on isa
sio9: type 16550A (multiport)
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: [0: fd0: 1.44MB 3.5in]
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC2340H>
wd0: 325MB (666600 total sec), 1010 cyl, 12 head, 55 sec, bytes/sec 512
wdc0: unit 1 (wd1): <Conner Peripherals 240MB - CP30254>
wd1: 240MB (492250 total sec), 895 cyl, 10 head, 55 sec, bytes/sec 512
uha0: reading board settings, dma=5 int=11 id=7
uha0 at 0x330-0x33f irq 11 drq 5 on isa
uha0 targ 0 lun 0: type 0(direct) fixed SCSI1
uha0 targ 0 lun 0: <MAXTOR  XT-4380S        B3D1>
sd0: 312MB (639450 total sec), 1224 cyl, 15 head, 34 sec, bytes/sec 512
uha0 targ 3 lun 0: type 0(direct) fixed SCSI2
uha0 targ 3 lun 0: <SEAGATE ST3550N         9416>
sd1: 435MB (891574 total sec), 2128 cyl, 5 head, 83 sec, bytes/sec 512
uha0 targ 4 lun 0: type 0(direct) fixed SCSI2
uha0 targ 4 lun 0: <QUANTUM ELS170S         3.09>
sd2: 163MB (333936 total sec), 1536 cyl, 4 head, 54 sec, bytes/sec 512
uha0 targ 5 lun 0: type 1(sequential) removable SCSI1
uha0 targ 5 lun 0: <ARCHIVE VIPER 150  21247-005>
st0: Archive  Viper 150 is a known rogue
st0: density code 0x0, 512-byte blocks, write-protected
npx0 on motherboard
newton@cleese> pstat -s
Device      512-blocks     Used    Avail Capacity  Type
/dev/wd1b        79200    19440    59760    25%    Interleaved
/dev/sd1b        39840    18992    20848    48%    Interleaved
/dev/sd2b        40176    18696    21480    47%    Interleaved
/dev/wd0b        39600    18944    20656    48%    Interleaved
Total           198816    76072   122744    38%
newton@cleese> df
Filesystem       1K-blocks     Used    Avail Capacity  Mounted on
/dev/wd0a            19079    12001     6124    66%    /
/dev/sd1a           406382   277012    88731    76%    /usr
/dev/wd0f           166105   143954    13845    91%    /var
/dev/wd1a           199465   170039     9479    95%    /local1
/dev/wd0e           117375    87373    24133    78%    /local2
/dev/sd2a           140634    81964    51638    61%    /local3
/dev/sd0a           292188   202093    60876    77%    /local4
procfs                   4        4        0   100%    /proc
kernfs                   1        1        0   100%    /kern
seldon:/u/adrian    521318   489442    21450    96%    /local1/users/adrian
newton@cleese> lsdev
Device     State           Description
---------- --------------- --------------------------------------------------
isa0       Busy            ISA or EISA bus
sc0        Unknown         Graphics console
ed0        Busy            NE2000
sio0       Unknown         RS-232 serial port
sio1       Unknown         RS-232 serial port
sio2       Unknown         RS-232 serial port
sio3       Unknown         RS-232 serial port
sio4       Unknown         RS-232 serial port
sio5       Unknown         RS-232 serial port
sio6       Unknown         RS-232 serial port
sio7       Unknown         RS-232 serial port
sio8       Unknown         RS-232 serial port
sio9       Unknown         RS-232 serial port
lpt0       Unknown         Parallel printer adapter
fdc0       Unknown         floppy disk/tape controller
fd0        Unknown         floppy disk
wdc0       Unknown         ST506/ESDI/IDE disk controller
wd0        Unknown         ST506/ESDI/IDE disk
wd1        Unknown         ST506/ESDI/IDE disk
uha0       Busy            UltraStore 14F or 34F SCSI host adapter
scbus0     Busy            SCSI subsystem
sd0        Unknown         SCSI disk
sd1        Unknown         SCSI disk
sd2        Unknown         SCSI disk
st0        Unknown         SCSI tape drive
npx0       Unknown         Floating-point unit

>Description:

If quotas are enabled, kernel spontaneously reboots after about 
five hours of uptime.  With quotas disabled, I get about a day and
a half before it hangs 'til it gets a manual reboot  (yes, I know,
that sounds like two completely unrelated problems).


>How-To-Repeat:

Boot 'er up and leave it running multiuser for a couple of days.
Unfortunately, I can't isolate it to a single bit of code:  I have
binaries here from FreeBSD 1.0, 1.1 and 2.0 being banged on by 
about 300 users (not all at once :-).  Any one of them at any time
could be triggering it.

The system is also acting as a secondary nameserver, news server for
about 3 dozen sites, secondary MX forwarder for about 70 UUCP, dialup
SLIP and UUCP sites, NFS server for a client with far too little disk
space, proxy caching HTTP server, a MOO, fileserver for a Sun 3/60
acting as an X11R6 X-terminal, and a dialup SLIP/PPP server.  I guess
what I'm trying to say is that with everything else that's going on,
I haven't been able to narrow down the problem at all.

I'm not being very helpful, am I? :-)

>Fix:
	
The quota problem is worked-around by... well... disabling quotas :-/
I haven't yet worked out what to do about the hangs, other than asking
my users to please put up with it 'til we can work something out.  
Unfortunately, when a system is catatonic, you can't break out to the 
kernel debugger to find out why...

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: gibbs 
State-Changed-When: Sat Jan 6 10:19:32 PST 1996 
State-Changed-Why:  
Most likely fixed by 2.0.5 or 2.1.  Even if its not, the 
information in this PR is out of date and a new PR should 
be logged aginst -stable or -current if the problem still 
exists. 

>Unformatted:


