From xyst@straynet.com  Sat Jun 17 10:41:14 2000
Return-Path: <xyst@straynet.com>
Received: from straynet.com (voyager.straynet.com [208.185.24.8])
	by hub.freebsd.org (Postfix) with SMTP id 3BEB637B652
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 17 Jun 2000 10:41:13 -0700 (PDT)
	(envelope-from xyst@straynet.com)
Received: (qmail 84755 invoked by uid 1013); 17 Jun 2000 17:41:29 -0000
Message-Id: <20000617174129.84754.qmail@straynet.com>
Date: 17 Jun 2000 17:41:29 -0000
From: greg@straynet.com
Sender: xyst@straynet.com
Reply-To: greg@straynet.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: fstat gives signal 10 (SIGBUS) when outputting data
X-Send-Pr-Version: 3.2

>Number:         19355
>Category:       bin
>Synopsis:       fstat gives signal 10 (SIGBUS) when outputting data
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 17 10:50:00 PDT 2000
>Closed-Date:    Sun Jan 13 10:38:45 PST 2002
>Last-Modified:  Sun Jan 13 10:39:06 PST 2002
>Originator:     Greg Prosser
>Release:        FreeBSD 3.4-STABLE i386
>Organization:
Straynet Online
>Environment:

	Straynet is a public hosting machine, with many users. When executing
	a script which calls sockstat I noticed fstat was dying on SIGBUS 
	repeatedly. I changed CFLAGS in the /usr/src Makefile for fstat to include
	-g, and ran it through gdb and still could not find the problem.

	The sources are the ones currently in the RELENG_3 line (I used anoncvs to
	checkout the sources, and ran diffs. No changes) and are running on a
	FreeBSD-stable machine with the following uname tag:

	FreeBSD voyager.straynet.com 3.4-STABLE FreeBSD 3.4-STABLE #0: Tue May 16 \
	20:16:55 EDT 2000 gregp@voyager.straynet.com:/usr/src/sys/compile/USERSSUCK \
	i386

	(Long lines wrapped with \)

>Description:

	This problem popped up when I was running sockstat, as stated earlier, and was 
	then isolated to fstat specifically. It appeared to core when listing information
	for a specific user (it was the last user in the list that appeared onscreen when
	doing a plain 'fstat' before it SIGBUS'd), and the behaviour repeats when I use
	fstat -u username. Example follows with gdb output.

[root@voyager] /usr/src/usr.bin/fstat: make clean all
[root@voyager] /usr/src/usr.bin/fstat: cd /usr/obj/usr/src/usr.bin/fstat
[root@voyager] /usr/obj/usr/src/usr.bin/fstat: gdb ./fstat
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
(gdb) run -u bin2ooo
Starting program: /usr/obj/usr/src/usr.bin/fstat/./fstat -u bin2ooo
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
bin2ooo  bnc        10558 root /             2 drwxr-xr-x    1024  r
bin2ooo  bnc        10558   wd /home    1714378 drwxr-xr-x     512  r
bin2ooo  bnc        10558 text /usr     3190287 -rwxr-xr-x   79658  r
bin2ooo  bnc        10558    0 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    1 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    2 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    3* internet stream tcp dc5b9180
bin2ooo  bnc        10558    4 /home    1714190 -rw-r--r--   25717  w
bin2ooo  bnc        10558    6* internet stream tcp dc5d4840
bin2ooo  bnc        10558    7* internet stream tcp dc5ff2a0
bin2ooo  bash       10547 text /usr     3190357 -rwxr-xr-x  367780  r

Program received signal SIGBUS, Bus error.
0x280cd832 in bcopy () from /usr/lib/libc.so.3
(gdb) bt
#0  0x280cd832 in bcopy () from /usr/lib/libc.so.3
#1  0x5 in ?? ()
#2  0x8048e80 in main (argc=3, argv=0xbfbfdbe8)
    at /usr/src/usr.bin/fstat/fstat.c:265
#3  0x80489f5 in _start ()
(gdb) up
#1  0x5 in ?? ()
(gdb) up
#2  0x8048e80 in main (argc=3, argv=0xbfbfdbe8)
    at /usr/src/usr.bin/fstat/fstat.c:265
265                     dofiles(p);
(gdb) list
260                     putchar('\n');
261
262             for (plast = &p[cnt]; p < plast; ++p) {
263                     if (p->kp_proc.p_stat == SZOMB)
264                             continue;
265                     dofiles(p);
266             }
267             exit(0);
268     }
269
(gdb) quit
The program is running.  Exit anyway? (y or n) y
[root@voyager] /usr/obj/usr/src/usr.bin/fstat: ps uwxU bin2ooo
USER      PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
bin2ooo 10547  0.0  0.0     0    0  p2  IEs+ -         0:00.00  (bash)
bin2ooo 10558  0.0  0.1  1000  576  ??  Is   Thu08PM   0:03.63 bnc
[root@voyager] /usr/obj/usr/src/usr.bin/fstat:

>How-To-Repeat:

	I'm not sure if this can be reproduced on other systems, I can't 
	seem to track down this error myself, so I can't pinpoint where it's 
	failing and thus reproduce it elsewhere, but for the last ten minutes
	the same action has caused this to happen again and again. More info 
	available upon request (that's if it's still
	failing when you request it :))

>Fix:
	
	I'm wondering if this is a memory failure somewhere in fstat? I'm
	no FreeBSD hacker, so I don't have the slightest clue.

>Release-Note:
>Audit-Trail:

From: Greg Prosser <greg@straynet.com>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: bin/19355: fstat gives signal 10 (SIGBUS) when outputting data
Date: Sat, 17 Jun 2000 14:28:16 -0400 (EDT)

 Hey .. I was playing around a little more with gdb, isolated it a little
 more to the exact line, and have some variable context information ..
 
 (gdb) step
 350                     bcopy(filed0.fd_dfiles, ofiles,
 (filed.fd_lastfile+1) * FPSIZE);
 (gdb) p filed0
 $3 = {fd_fd = {fd_ofiles = 0xc8128d80, fd_ofileflags = 0x0, fd_cdir = 0x0,
     fd_rdir = 0x0, fd_nfiles = 0, fd_lastfile = 6922, fd_freefile = 12635,
     fd_cmask = 12859, fd_refcnt = 29236}, fd_dfiles = {0x32325b1b,
 0x1b48313b,
     0x20204b5b, 0x20202020, 0x20202020, 0x20202020, 0x20202020,
 0x2f232020,
     0x4057753c, 0x23205469, 0x57753c2f, 0x20546940, 0x753c2f23,
 0x54694057,
     0x3c2f2320, 0x69405775, 0x2f232054, 0x4057753c, 0x23205469,
 0x57753c2f},
   fd_dfileflags = "@iT\e[K\e[1;22r\e[22;1H"}
 (gdb) p filed0.fd_fd.fd_lastfile
 $4 = 6922
 (gdb) p ofiles
 $5 = (struct file **) 0x8068000
 (gdb) p *ofiles
 $6 = (struct file *) 0x0
 (gdb) p (filed0.fd_fd.fd_lastfile+1)
 $7 = 6923
 [note: FPSIZE must be a define, I had several errors printing the whole
 expression]
 (gdb) p filed0.fd_dfiles
 $8 = {0x32325b1b, 0x1b48313b, 0x20204b5b, 0x20202020, 0x20202020,
 0x20202020,
   0x20202020, 0x2f232020, 0x4057753c, 0x23205469, 0x57753c2f, 0x20546940,
   0x753c2f23, 0x54694057, 0x3c2f2320, 0x69405775, 0x2f232054, 0x4057753c,
   0x23205469, 0x57753c2f}
 
 I'm still puzzled .. if no information comes back regarding more requests
 for info without the next 30 minutes - hour, i'm going to kill the
 offending pid, and end this. (note: I tracked down the pid by fstat'ing
 each of the user's processes).
 
 /gp
 
 .... ..   .  ... .     .       .   .     .
               g r e g @ s t r a y n e t . c o m
 .-----.----.-----.-----. senior administrator, straynet online
 |  _  |   _|  -__|  _  | head network administrator, wen dot net
 |___  |__| |_____|___  | staff consultant, micro web company
 |_____|          |_____| icq: 10405504      /    aol im: xysters
 
 
 
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Sat Aug 25 15:45:46 PDT 2001 
State-Changed-Why:  

Have you seen this problem occur again since? I seem to remember 
seeing something similar quite a while ago, but I never attempted 
to track it down.  

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19355 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Jan 13 10:38:45 PST 2002 
State-Changed-Why:  

Feedback timeout. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19355 
>Unformatted:
