From gslin@alumni2.csie.nctu.edu.tw  Wed May 24 04:19:47 2006
Return-Path: <gslin@alumni2.csie.nctu.edu.tw>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 15ED016A4D4
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 24 May 2006 04:19:47 +0000 (UTC)
	(envelope-from gslin@alumni2.csie.nctu.edu.tw)
Received: from alumni2.csie.nctu.edu.tw (alumni2.csie.nctu.edu.tw [140.113.209.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 95F2F43D46
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 24 May 2006 04:19:46 +0000 (GMT)
	(envelope-from gslin@alumni2.csie.nctu.edu.tw)
Received: from alumni2.csie.nctu.edu.tw (gslin@localhost [127.0.0.1])
	by alumni2.csie.nctu.edu.tw (8.13.4/8.13.4) with ESMTP id k4O4Jm4U067775;
	Wed, 24 May 2006 12:19:48 +0800 (CST)
	(envelope-from gslin@alumni2.csie.nctu.edu.tw)
Received: (from gslin@localhost)
	by alumni2.csie.nctu.edu.tw (8.13.4/8.13.4/Submit) id k4O4JmXK067774;
	Wed, 24 May 2006 12:19:48 +0800 (CST)
	(envelope-from gslin)
Message-Id: <200605240419.k4O4JmXK067774@alumni2.csie.nctu.edu.tw>
Date: Wed, 24 May 2006 12:19:48 +0800 (CST)
From: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
Reply-To: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
To: FreeBSD-gnats-submit@freebsd.org
Cc: gslin@csie.nctu.edu.tw
Subject: NFS rpc.lockd will die automatically
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         97768
>Category:       bin
>Synopsis:       [nfs] NFS rpc.lockd will die automatically
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rodrigc
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed May 24 04:20:11 GMT 2006
>Closed-Date:    Tue Sep 12 17:00:44 GMT 2006
>Last-Modified:  Tue Sep 12 17:00:44 GMT 2006
>Originator:     Gea-Suan Lin
>Release:        FreeBSD 6.0-RELEASE-p7 i386
>Organization:
>Environment:
System: FreeBSD alumni2 6.0-RELEASE-p7 FreeBSD 6.0-RELEASE-p7 #4: Wed May 3 23:43:12 CST 2006 localBSD@fakealumni:/usr/obj/usr/src/sys/CSIEBSD i386


	
>Description:
- rpc.lockd (uid = daemon) will die automatically on FreeBSD
  6.0-RELEASE-p7, this is ktrace/kdump log:

  (If need, we have full ktrace/kdump log, both uid=root rpc.lockd and
   uid=daemon rpc.lockd)

 58205 rpc.lockd RET   sendto 56/0x38
 58205 rpc.lockd CALL  kevent(0xd,0x80a60dc,0x1,0xbfbfdfa0,0x1,0xbfbfdf48)
 58205 rpc.lockd RET   kevent 1
 58205 rpc.lockd CALL  recvfrom(0xc,0x80a60f4,0x2260,0,0,0)
 58205 rpc.lockd GIO   fd 12 read 28 bytes
       0x0000 4470 cb8f 0000 0001 0000 0000 0000 0000 0000 0000  |Dp..................|
       0x0014 0000 0000 0000 0000                                |........|

 58205 rpc.lockd RET   recvfrom 28/0x1c
 58205 rpc.lockd CALL  close(0xd)
 58205 rpc.lockd RET   close 0
 58205 rpc.lockd CALL  close(0xc)
 58205 rpc.lockd RET   close 0
 58205 rpc.lockd CALL  gettimeofday(0xbfbfd448,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  sendto(0x7,0xbfbfd920,0x4a,0,0,0)
 58205 rpc.lockd GIO   fd 7 wrote 74 bytes
       "<27>May 24 03:02:10 rpc.lockd: clntudp_create: RPC: Program not registered"
 58205 rpc.lockd RET   sendto 74/0x4a
 58205 rpc.lockd CALL  gettimeofday(0xbfbfd448,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  sendto(0x7,0xbfbfd920,0x48,0,0,0)
 58205 rpc.lockd GIO   fd 7 wrote 72 bytes
       "<27>May 24 03:02:10 rpc.lockd: Unable to return result to 140.113.209.21"
 58205 rpc.lockd RET   sendto 72/0x48
 58205 rpc.lockd CALL  write(0x8,0xbfbfe680,0x20)
 58205 rpc.lockd GIO   fd 8 wrote 32 bytes
       0x0000 0100 0000 7557 0000 325c 7344 d5a2 0200 0100 0000  |....uW..2\sD........|
       0x0014 4100 0000 abbd 1428 0400 0000                      |A......(....|

 58205 rpc.lockd RET   write 32/0x20
 58205 rpc.lockd CALL  read(0x8,0xbfbfe6a0,0x194)
 58205 rpc.lockd GIO   fd 8 read 404 bytes
       0x0000 0000 0000 4881 6cc0 0300 0000 7557 0000 325c 7344  |....H.l.....uW..2\sD|
       0x0014 d5a2 0200 0200 0000 8e67 2100 0000 0000 0000 0000  |.........g!.........|
       0x0028 0000 0000 8e67 2100 0200 0100 0000 0000 0000 0000  |.....g!.............|
       0x003c 1002 0801 8c71 d115 0000 0000 0000 0000 80fd e8c1  |.....q..............|
       0x0050 00cc edc1 ffff ffff ac0a fdea d789 4dc0 00cc edc1  |..............M.....|
       0x0064 54cd edc1 0009 41c2 48bc 99c5 c00a fdea a68a 4dc0  |T.....A.H.........M.|
       0x0078 202c 6bc0 54cd edc1 0000 0000 d00a fdea d58b 4dc0  | ,k.T.............M.|
       0x008c 0600 0800 8b89 4dc0 0009 41c2 00cc edc1 838c 2002  |......M...A....... .|
       0x00a0 f0bc 99c5 48bc 99c5 340b fdea c6de 4cc0 0009 41c2  |....H...4.....L...A.|
       0x00b4 0000 0000 0100 0000 0002 0000 1c00 0000 0000 0000  |....................|
       0x00c8 4d37 0000 0200 0000 9001 0000 9001 0000 2003 0000  |M7.............. ...|
       0x00dc bc02 0000 bc02 0000 bc02 0000 0000 0000 0000 0000  |....................|
       0x00f0 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  |....................|
       0x0104 0000 0000 0000 0000 0000 0000 0000 0000 3d27 663b  |................='f;|
       0x0118 f0dd a80e 0c00 0000 4d36 3f00 9ccd e33e 0000 0000  |........M6?....>....|
       0x012c 0000 0000 0000 0000 0000 0000 0001 0000 1006 8bc4  |....................|
       0x0140 0000 0000 0001 0000 ac0b fdea 6b14 51c0 3406 8bc4  |............k.Q.4...|
       0x0154 cc05 8bc4 4d01 0000 1983 66c0 0000 0000 0400 0000  |....M.....f.........|
       0x0168 cc05 8bc4 0000 0000 700c fdea f409 5ac0 1006 8bc4  |........p.....Z.....|
       0x017c 0001 0000 0000 0000 800b 6bc0 cc05 8bc4 0000 0000  |..........k.........|
       0x0190 0000 0000                                          |....|

 58205 rpc.lockd RET   read 404/0x194
 58205 rpc.lockd CALL  gettimeofday(0xbfbfe1a8,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  open(0x2814e537,0,0x1b6)
 58205 rpc.lockd NAMI  "/etc/netconfig"
 58205 rpc.lockd RET   open 12/0xc
 58205 rpc.lockd CALL  fstat(0xc,0xbfbfe040)
 58205 rpc.lockd RET   fstat 0
 58205 rpc.lockd CALL  read(0xc,0x80a4000,0x1000)
 58205 rpc.lockd GIO   fd 12 read 783 bytes
       "# $FreeBSD: src/etc/netconfig,v 1.3 2002/12/16 22:24:25 mbr Exp $
	#
	# The network configuration file. This file is currently only used in
	# conjunction with the (TI-) RPC code in the C library, unlike its
	# use in SVR4.
	#
	# Entries consist of:
	#
	#       <network_id> <semantics> <flags> <protofamily> <protoname> \\
	#               <device> <nametoaddr_libs>
	#
	# The <device> and <nametoaddr_libs> fields are always empty in FreeBSD.
	#
	udp6       tpi_clts      v     inet6    udp     -       -
	tcp6       tpi_cots_ord  v     inet6    tcp     -       -
	udp        tpi_clts      v     inet     udp     -       -
	tcp        tpi_cots_ord  v     inet     tcp     -       -
	rawip      tpi_raw       -     inet      -      -       -
	local      tpi_cots_ord  -     loopback  -      -       -
       "
 58205 rpc.lockd RET   read 783/0x30f
 58205 rpc.lockd CALL  close(0xc)
 58205 rpc.lockd RET   close 0
 58205 rpc.lockd CALL  socket(0x2,0x2,0x11)
 58205 rpc.lockd RET   socket 12/0xc
 58205 rpc.lockd CALL  getsockname(0xc,0xbfbfde60,0xbfbfde5c)
 58205 rpc.lockd RET   getsockname 0
 58205 rpc.lockd CALL  getsockopt(0xc,0xffff,0x1008,0xbfbfde58,0xbfbfde5c)
 58205 rpc.lockd RET   getsockopt 0
 58205 rpc.lockd CALL  getsockname(0xc,0xbfbfde40,0xbfbfde3c)
 58205 rpc.lockd RET   getsockname 0
 58205 rpc.lockd CALL  getsockopt(0xc,0,0x13,0xbfbfde34,0xbfbfde38)
 58205 rpc.lockd RET   getsockopt 0
 58205 rpc.lockd CALL  setsockopt(0xc,0,0x13,0xbfbfde30,0x4)
 58205 rpc.lockd RET   setsockopt 0
 58205 rpc.lockd CALL  bind(0xc,0xbfbfde40,0x10)
 58205 rpc.lockd RET   bind -1 errno 1 Operation not permitted
 58205 rpc.lockd CALL  setsockopt(0xc,0,0x13,0xbfbfde34,0x4)
 58205 rpc.lockd RET   setsockopt 0
 58205 rpc.lockd CALL  getsockname(0xc,0xbfbfdda0,0xbfbfdd9c)
 58205 rpc.lockd RET   getsockname 0
 58205 rpc.lockd CALL  getsockopt(0xc,0xffff,0x1008,0xbfbfdd98,0xbfbfdd9c)
 58205 rpc.lockd RET   getsockopt 0
 58205 rpc.lockd CALL  gettimeofday(0xbfbfde68,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  getpid
 58205 rpc.lockd RET   getpid 58205/0xe35d
 58205 rpc.lockd CALL  ioctl(0xc,FIONBIO,0xbfbfde64)
 58205 rpc.lockd RET   ioctl 0
 58205 rpc.lockd CALL  gettimeofday(0xbfbfdf90,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  kqueue
 58205 rpc.lockd RET   kqueue 13/0xd
 58205 rpc.lockd CALL  sendto(0xc,0x80a6354,0x38,0,0x80a4008,0x10)
 58205 rpc.lockd GIO   fd 12 wrote 56 bytes
       0x0000 4470 14fd 0000 0000 0000 0002 0001 86a0 0000 0002  |Dp..................|
       0x0014 0000 0003 0000 0000 0000 0000 0000 0000 0000 0000  |....................|
       0x0028 0001 86b5 0000 0004 0000 0011 0000 0000            |................|

 58205 rpc.lockd RET   sendto 56/0x38
 58205 rpc.lockd CALL  kevent(0xd,0x80a40dc,0x1,0xbfbfdfc0,0x1,0xbfbfdf68)
 58205 rpc.lockd RET   kevent 1
 58205 rpc.lockd CALL  recvfrom(0xc,0x80a40f4,0x2260,0,0,0)
 58205 rpc.lockd GIO   fd 12 read 28 bytes
       0x0000 4470 14fd 0000 0001 0000 0000 0000 0000 0000 0000  |Dp..................|
       0x0014 0000 0000 0000 0000                                |........|

 58205 rpc.lockd RET   recvfrom 28/0x1c
 58205 rpc.lockd CALL  close(0xd)
 58205 rpc.lockd RET   close 0
 58205 rpc.lockd CALL  close(0xc)
 58205 rpc.lockd RET   close 0
 58205 rpc.lockd CALL  gettimeofday(0xbfbfd468,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  sendto(0x7,0xbfbfd940,0x4a,0,0,0)
 58205 rpc.lockd GIO   fd 7 wrote 74 bytes
       "<27>May 24 03:02:10 rpc.lockd: clntudp_create: RPC: Program not registered"
 58205 rpc.lockd RET   sendto 74/0x4a
 58205 rpc.lockd CALL  gettimeofday(0xbfbfd468,0)
 58205 rpc.lockd RET   gettimeofday 0
 58205 rpc.lockd CALL  sendto(0x7,0xbfbfd940,0x48,0,0,0)
 58205 rpc.lockd GIO   fd 7 wrote 72 bytes
       "<27>May 24 03:02:10 rpc.lockd: Unable to return result to 140.113.209.21"
 58205 rpc.lockd RET   sendto 72/0x48
 58205 rpc.lockd CALL  write(0x8,0xbfbfe680,0x20)
 58205 rpc.lockd RET   write -1 errno 32 Broken pipe
 58205 rpc.lockd PSIG  SIGPIPE SIG_DFL
	
>How-To-Repeat:
	Unknown, but this happen on our machine"s" frequently.
	
>Fix:
	Unknown.

	


>Release-Note:
>Audit-Trail:

From: Craig Rodrigues <rodrigc@crodrigues.org>
To: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
Cc: bug-followup@freebsd.org
Subject: Re: bin/97768: NFS rpc.lockd will die automatically
Date: Wed, 24 May 2006 00:35:43 -0400

 On Wed, May 24, 2006 at 12:19:48PM +0800, Gea-Suan Lin wrote:
 > >Description:
 > - rpc.lockd (uid = daemon) will die automatically on FreeBSD
 >   6.0-RELEASE-p7, this is ktrace/kdump log:
 > 
 >  58205 rpc.lockd CALL  sendto(0x7,0xbfbfd940,0x48,0,0,0)
 >  58205 rpc.lockd GIO   fd 7 wrote 72 bytes
 >        "<27>May 24 03:02:10 rpc.lockd: Unable to return result to 140.113.209.21"
 >  58205 rpc.lockd RET   sendto 72/0x48
 >  58205 rpc.lockd CALL  write(0x8,0xbfbfe680,0x20)
 >  58205 rpc.lockd RET   write -1 errno 32 Broken pipe
 >  58205 rpc.lockd PSIG  SIGPIPE SIG_DFL
 
 
 Does this patch to rpc.lockd help?
 
 ===================================================================
 RCS file: /home/ncvs/src/usr.sbin/rpc.lockd/kern.c,v
 retrieving revision 1.17
 diff -u -u -r1.17 kern.c
 --- kern.c      17 Nov 2005 12:19:19 -0000      1.17
 +++ kern.c      24 May 2006 04:25:05 -0000
 @@ -151,6 +151,7 @@
 
         signal(SIGHUP, (sig_t)client_cleanup);
         signal(SIGTERM, (sig_t)client_cleanup);
 +       signal(SIGPIPE, SIG_IGN);
 
         /* Setup. */
         (void)time(&owner.tod);
 
 -- 
 Craig Rodrigues        
 rodrigc@crodrigues.org

From: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
To: Craig Rodrigues <rodrigc@crodrigues.org>
Cc: Gea-Suan Lin <gslin@csie.nctu.edu.tw>, bug-followup@freebsd.org
Subject: Re: bin/97768: NFS rpc.lockd will die automatically
Date: Fri, 26 May 2006 00:58:16 +0800

 Hi,
 
 After this patch, they look good now.
 
 Is this some kind of race condition ? I mean, between kevent() returning
 "ok, you can write" and rpc.lockd calling write(), the connection fd=8
 disappeared, then write() cause SIGPIPE.
 
 On Wed, May 24, 2006 at 12:35:43AM -0400, Craig Rodrigues wrote:
 > On Wed, May 24, 2006 at 12:19:48PM +0800, Gea-Suan Lin wrote:
 > > >Description:
 > > - rpc.lockd (uid = daemon) will die automatically on FreeBSD
 > >   6.0-RELEASE-p7, this is ktrace/kdump log:
 > > 
 > >  58205 rpc.lockd CALL  sendto(0x7,0xbfbfd940,0x48,0,0,0)
 > >  58205 rpc.lockd GIO   fd 7 wrote 72 bytes
 > >        "<27>May 24 03:02:10 rpc.lockd: Unable to return result to 140.113.209.21"
 > >  58205 rpc.lockd RET   sendto 72/0x48
 > >  58205 rpc.lockd CALL  write(0x8,0xbfbfe680,0x20)
 > >  58205 rpc.lockd RET   write -1 errno 32 Broken pipe
 > >  58205 rpc.lockd PSIG  SIGPIPE SIG_DFL
 > 
 > 
 > Does this patch to rpc.lockd help?
 > 
 > ===================================================================
 > RCS file: /home/ncvs/src/usr.sbin/rpc.lockd/kern.c,v
 > retrieving revision 1.17
 > diff -u -u -r1.17 kern.c
 > --- kern.c      17 Nov 2005 12:19:19 -0000      1.17
 > +++ kern.c      24 May 2006 04:25:05 -0000
 > @@ -151,6 +151,7 @@
 > 
 >         signal(SIGHUP, (sig_t)client_cleanup);
 >         signal(SIGTERM, (sig_t)client_cleanup);
 > +       signal(SIGPIPE, SIG_IGN);
 > 
 >         /* Setup. */
 >         (void)time(&owner.tod);
 
 -- 
 * Gea-Suan Lin  (public key: Using https://keyserver.pgp.com/ to search)
 * If you cannot convince them, confuse them.           -- Harry S Truman
State-Changed-From-To: open->patched 
State-Changed-By: rodrigc 
State-Changed-When: Thu May 25 22:13:45 UTC 2006 
State-Changed-Why:  
Fixed in CURRENT, will MFC to RELENG_6. 


Responsible-Changed-From-To: freebsd-bugs->rodrigc 
Responsible-Changed-By: rodrigc 
Responsible-Changed-When: Thu May 25 22:13:45 UTC 2006 
Responsible-Changed-Why:  
Fixed in CURRENT, will MFC to RELENG_6. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=97768 

From: Craig Rodrigues <rodrigc@crodrigues.org>
To: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/97768: NFS rpc.lockd will die automatically
Date: Fri, 26 May 2006 07:45:30 -0400

 On Fri, May 26, 2006 at 04:21:46PM +0800, Gea-Suan Lin wrote:
 > Hello sir,
 > 
 > Today we found uid=root rpc.lockd dead, the reason is SIGPIPE too.
 > 
 > After checking your patch, I found signal(SIGPIPE, SIG_IGN); only
 > affect uid=daemon rpc.lockd (because it runs after fork()), so maybe
 > you should put signal() before fork() ?
 
 
 Can you put the signal(SIG_PIPE, SIG_IGN); before the
 fork(); in client_request() and
 try again?  If it works for you, I will change my patch.
 
  
 -- 
 Craig Rodrigues        
 rodrigc@crodrigues.org

From: Gea-Suan Lin <gslin@csie.nctu.edu.tw>
To: Craig Rodrigues <rodrigc@crodrigues.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/97768: NFS rpc.lockd will die automatically
Date: Sat, 27 May 2006 07:29:40 +0800

 Hello,
 
 On Fri, May 26, 2006 at 07:45:30AM -0400, Craig Rodrigues wrote:
 > On Fri, May 26, 2006 at 04:21:46PM +0800, Gea-Suan Lin wrote:
 > > Hello sir,
 > > 
 > > Today we found uid=root rpc.lockd dead, the reason is SIGPIPE too.
 > > 
 > > After checking your patch, I found signal(SIGPIPE, SIG_IGN); only
 > > affect uid=daemon rpc.lockd (because it runs after fork()), so maybe
 > > you should put signal() before fork() ?
 > 
 > Can you put the signal(SIG_PIPE, SIG_IGN); before the
 > fork(); in client_request() and
 > try again?  If it works for you, I will change my patch.
 
 After 12 hours in both NFS clients (14 clients) and servers (2 servers),
 no rpc.lockd die, and the locking mechanism works fine.
 
 --- kern.c.orig Sat May 27 07:11:02 2006
 +++ kern.c      Fri May 26 20:32:54 2006
 @@ -136,6 +136,9 @@
                 syslog(LOG_ERR, "open: %s: %m", _PATH_NFSLCKDEV);
                 goto err;
         }
 +
 +       signal(SIGPIPE, SIG_IGN);
 +
         /*
          * Create a separate process, the client code is really a separate
          * daemon that shares a lot of code.
 
 -- 
 * Gea-Suan Lin  (public key: Using https://keyserver.pgp.com/ to search)
 * If you cannot convince them, confuse them.           -- Harry S Truman
State-Changed-From-To: patched->closed 
State-Changed-By: rodrigc 
State-Changed-When: Tue Sep 12 16:59:52 UTC 2006 
State-Changed-Why:  
MFC'd, revision: 1.16.2.1 of src/usr.sbin/rpc.locked/kern.c 
Reminded by delphij to close this. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=97768 
>Unformatted:
