From nomad@crow.ee.washington.edu  Thu Mar 27 23:55:57 2008
Return-Path: <nomad@crow.ee.washington.edu>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9FD49106564A
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 27 Mar 2008 23:55:57 +0000 (UTC)
	(envelope-from nomad@crow.ee.washington.edu)
Received: from crow.ee.washington.edu (crow.ee.washington.edu [128.208.232.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F59C8FC1D
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 27 Mar 2008 23:55:57 +0000 (UTC)
	(envelope-from nomad@crow.ee.washington.edu)
Received: from goose.ee.washington.edu (goose.ee.washington.edu [128.208.232.11])
	by crow.ee.washington.edu (8.13.1/8.13.3) with ESMTP id m2RNtsIU019424
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 27 Mar 2008 16:55:54 -0700
Received: from goose.ee.washington.edu (localhost [127.0.0.1])
	by goose.ee.washington.edu (8.14.2/8.12.10) with ESMTP id m2RNts2W004226;
	Thu, 27 Mar 2008 16:55:54 -0700 (PDT)
Received: (from nomad@localhost)
	by goose.ee.washington.edu (8.14.2/8.14.2/Submit) id m2RNtsb9004225;
	Thu, 27 Mar 2008 16:55:54 -0700 (PDT)
	(envelope-from nomad)
Message-Id: <200803272355.m2RNtsb9004225@goose.ee.washington.edu>
Date: Thu, 27 Mar 2008 16:55:54 -0700 (PDT)
From: Lee Damon <nomad@ssli-mail.ee.washington.edu>
Reply-To: Lee Damon <nomad@ssli-mail.ee.washington.edu>
To: FreeBSD-gnats-submit@freebsd.org
Cc: nomad@crow.ee.washington.edu
Subject: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         122172
>Category:       bin
>Synopsis:       [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 28 00:00:03 UTC 2008
>Closed-Date:    
>Last-Modified:  Tue May 27 02:00:07 UTC 2008
>Originator:     Lee Damon
>Release:        FreeBSD 6.3-STABLE i386
>Organization:
Univ. of Washington Electrical Engr, SSLI LAB
>Environment:
System: FreeBSD goose.ee.washington.edu 6.3-STABLE FreeBSD 6.3-STABLE #6: Wed
r 26 17:03:35 PDT 2008 root@goose.ee.washington.edu:/usr/obj/usr/src/sys/NIKO
i386

goose was CVSupd, buildworld, buildkernel and installed around 15:00
PDT on 26 MAR, 2008.  This was done trying to solve the problem.
The problem showed up from CVSup, buildworld, buildkernel on 14 
FEB, 2008 at 11:32 PST.

The other i386 system with same problem and 8 amd64 systems which don't
have the problem were all CVSup'd and built on 14 FEB, 2008 at 11:32
PST.  

        
>Description:
 amd(8) is launched on boot (or later) and runs briefly then aborts.
 If it is launched on boot then it never gets past reclaiming all
 the children it starts to help it boot up.  One of the children
 (or the parent in some cases) aborts with a SIG 11.

 The attached gdb & truss output were obtained by starting amd
 manually after boot. It gets past the part where the children
 finish setup but eventually dies. Sometimes it is SIG 10, sometimes
 SIG 11.

 I have a truss output and amd log file available but gnats thought
 they were too big to include in the pr email. The core file and
 amd binary are available for examination if needed.

 The amd.conf and map files are the same on all 10 systems.

>How-To-Repeat:
 configure and launch amd on a i386 6.3-STABLE system.

>Fix:

        none known.


--- gdb.out begins here ---
Script started on Thu Mar 27 14:42:26 2008
goose# gdb -c amd.core amd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `amd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done.
Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1
Reading symbols from /usr/local/lib/libldap-2.3.so.2...done.
Loaded symbols for /usr/local/lib/libldap-2.3.so.2
Reading symbols from /usr/local/lib/liblber-2.3.so.2...done.
Loaded symbols for /usr/local/lib/liblber-2.3.so.2
Reading symbols from /usr/local/lib/libgssapi_krb5.so...done.
Loaded symbols for /usr/local/lib/libgssapi_krb5.so
Reading symbols from /usr/local/lib/libssl.so.5...done.
Loaded symbols for /usr/local/lib/libssl.so.5
Reading symbols from /usr/local/lib/libcrypto.so.5...done.
Loaded symbols for /usr/local/lib/libcrypto.so.5
Reading symbols from /usr/local/lib/libkrb5.so...done.
Loaded symbols for /usr/local/lib/libkrb5.so
Reading symbols from /usr/local/lib/libk5crypto.so...done.
Loaded symbols for /usr/local/lib/libk5crypto.so
Reading symbols from /usr/local/lib/libcom_err.so...done.
Loaded symbols for /usr/local/lib/libcom_err.so
Reading symbols from /usr/local/lib/libkrb5support.so...done.
Loaded symbols for /usr/local/lib/libkrb5support.so
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) bt
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
#1  0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157
#2  0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215
#3  0x28112673 in svc_getreq_common () from /lib/libc.so.6
#4  0x281126e8 in svc_getreqset () from /lib/libc.so.6
#5  0x0805c2a5 in run_rpc ()
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294
#6  0x0805c505 in mount_automounter (ppid=2487)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448
#7  0x0804deaa in main (argc=5, argv=0xbfbfecd0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564
(gdb) where
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
#1  0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157
#2  0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215
#3  0x28112673 in svc_getreq_common () from /lib/libc.so.6
#4  0x281126e8 in svc_getreqset () from /lib/libc.so.6
#5  0x0805c2a5 in run_rpc ()
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294
#6  0x0805c505 in mount_automounter (ppid=2487)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448
#7  0x0804deaa in main (argc=5, argv=0xbfbfecd0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564
(gdb) goose# exit
exit

Script done on Thu Mar 27 14:42:38 2008
--- gdb.out ends here ---

--- gdb1.out begins here ---
Script started on Thu Mar 27 15:34:48 2008
goose# gdb -c amd.core amd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `amd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done.
Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1
Reading symbols from /usr/local/lib/libldap-2.3.so.2...done.
Loaded symbols for /usr/local/lib/libldap-2.3.so.2
Reading symbols from /usr/local/lib/liblber-2.3.so.2...done.
Loaded symbols for /usr/local/lib/liblber-2.3.so.2
Reading symbols from /usr/local/lib/libgssapi_krb5.so...done.
Loaded symbols for /usr/local/lib/libgssapi_krb5.so
Reading symbols from /usr/local/lib/libssl.so.5...done.
Loaded symbols for /usr/local/lib/libssl.so.5
Reading symbols from /usr/local/lib/libcrypto.so.5...done.
Loaded symbols for /usr/local/lib/libcrypto.so.5
Reading symbols from /usr/local/lib/libkrb5.so...done.
Loaded symbols for /usr/local/lib/libkrb5.so
Reading symbols from /usr/local/lib/libk5crypto.so...done.
Loaded symbols for /usr/local/lib/libk5crypto.so
Reading symbols from /usr/local/lib/libcom_err.so...done.
Loaded symbols for /usr/local/lib/libcom_err.so
Reading symbols from /usr/local/lib/libkrb5support.so...done.
Loaded symbols for /usr/local/lib/libkrb5support.so
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) frame 0
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) list
302	flush_nfs_fhandle_cache(fserver *fs)
303	{
304	  fh_cache *fp;
305	
306	  ITER(fp, fh_cache, &fh_head) {
307	    if (fp->fh_fs == fs || fs == NULL) {
308	      /*
309	       * Only invalidate port info for non-WebNFS servers
310	       */
311	      if (!(fp->fh_fs->fs_flags & FSF_WEBNFS))
(gdb) info frame
Stack level 0, frame at 0xbfbfe450:
 eip = 0x805d8fa in flush_nfs_fhandle_cache
    (/usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307); 
    saved eip 0x805272d
 called by frame at 0xbfbfe470
 source language c.
 Arglist at 0xbfbfe448, args: fs=0x0
 Locals at 0xbfbfe448, Previous frame's sp is 0xbfbfe450
 Saved registers:
  ebp at 0xbfbfe448, eip at 0xbfbfe44c
(gdb) info args
fs = (fserver *) 0x0
(gdb) info locals
fp = (fh_cache *) 0x8
(gdb) print fp
$1 = (fh_cache *) 0x8
(gdb) 

Script done on Thu Mar 27 15:35:19 2008
--- gdb1.out ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-i386->freebsd-fs 
Responsible-Changed-By: remko 
Responsible-Changed-When: Sat Apr 5 08:11:45 UTC 2008 
Responsible-Changed-Why:  
The backtraces show that amd(8) has a problem, reassign to the 
fs team to investigate this. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=122172 

From: John Hein <jhein@timing.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
Date: Mon, 7 Apr 2008 21:16:37 -0600

 This doesn't help your problem directly, but we've been using amd with
 NIS maps and 6.3/i386 without any problems.  What's your configuration?
 
 You might have to debug a little further to find out how fp gets set
 to NULL.
 
 You could also try the newer version of am-utils in ports just
 to see if it behaves differently.
 
 Have you tried searching back from your cvsup date to see when
 it stops seg faulting for you?

From: John Hein <jhein@timing.com>
To: Lee Damon <nomad@crow.ee.washington.edu>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE
 i386, fine on amd6
Date: Tue, 8 Apr 2008 11:52:18 -0600

 Lee Damon wrote at 09:40 -0700 on Apr  8, 2008:
  > John Hein wrote:
  > > This doesn't help your problem directly, but we've been using amd with
  > > NIS maps and 6.3/i386 without any problems.  What's your configuration?
  > 
  > The maps are flat files but we use LDAP.
  > 
  > > You could also try the newer version of am-utils in ports just
  > > to see if it behaves differently.
  > 
  > thanks for the hints.  Sadly the version in the ports tree tied the same 
  > horrible death.
 
 You should put that information in the PR (CC restored).
 
 
  > > Have you tried searching back from your cvsup date to see when
  > > it stops seg faulting for you?
  > 
  > These are production machines, I can't take them down for the time it 
  > would take to do that :(
 
 Unfortunately, all I have are debugging suggestions...
 
  - Bring up a non-production machine to play with.
 
  - Bring up a virtual machine or jail to play with.
 
  - Start with a bare bones amd config (e.g., without anything
    but the default maps & .conf files).  If there's no core
    dump, then add back parts of your config until it dies.
 
  - Compile amd with debug on and turn up the debug level to
    see if you get any hints.
 
  - Trace deeper into the code to find the source of the null ptr.
 
  - Try asking on the am-utils mailing list.

From: Lee Damon <nomad@castle.org>
To: bug-followup@FreeBSD.org, nomad@crow.ee.washington.edu
Cc:  
Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE
 i386, fine on amd6
Date: Tue, 08 Apr 2008 10:59:27 -0700

  > You could also try the newer version of am-utils in ports just
  > to see if it behaves differently.
 
 Just tried, same failure (exited with signal 10).  Corefile & binary are 
 available if you want them but the port compile defaulted to no 
 debugging and I forgot to turn it on so there's not a lot of information 
 there.  Since these are both production machines and amd crashing 
 requires the host to reboot I can't easily test again.
 
 nomad
>Unformatted:
