From nobody@FreeBSD.org  Thu Apr  5 09:10:06 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C78D116A402
	for <freebsd-gnats-submit@FreeBSD.org>; Thu,  5 Apr 2007 09:10:06 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id 9EB9113C45D
	for <freebsd-gnats-submit@FreeBSD.org>; Thu,  5 Apr 2007 09:10:06 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l3599wxh096206
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 5 Apr 2007 09:09:58 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id l3594vbs095444;
	Thu, 5 Apr 2007 09:04:57 GMT
	(envelope-from nobody)
Message-Id: <200704050904.l3594vbs095444@www.freebsd.org>
Date: Thu, 5 Apr 2007 09:04:57 GMT
From: Zhouyi Zhou<zhouyi04@ios.cn>
To: freebsd-gnats-submit@FreeBSD.org
Subject: FreeBSD kernel dead lock and a solution
X-Send-Pr-Version: www-3.0

>Number:         111260
>Category:       kern
>Synopsis:       [hang] FreeBSD kernel dead lock and a solution
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    csjp
>State:          patched
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr 05 09:20:02 GMT 2007
>Closed-Date:    
>Last-Modified:  Tue Aug 12 21:40:02 UTC 2008
>Originator:     Zhouyi Zhou
>Release:        FreeBSD 5-7
>Organization:
Institute of Software, Chinese Academy of Sciences
>Environment:
FreeBSD zzy 6.0-RELEASE FreeBSD 6.0-RELEASE #5 i386
>Description:
In the recently testing of FreeBSD using people.freebsd.org/~pho/stress/src/stress2.tgz, when running the 7 tests simulatanouly, the FreeBSD kernel will goto dead lock after three or more days .

The reason is as follows:
In function vm_fault at vm/vm_fault.c

299         fs.vp = vnode_pager_lock(fs.first_object);
300         KASSERT(fs.vp == NULL || !fs.map->system_map,

The kernel will try to get a lock on fs.vp with the fs.map still locks.

while in function do_execve at kern/kern_exec.c

462         if (p->p_sysent->sv_copyout_strings)
463                 stack_base = (*p->p_sysent->sv_copyout_strings)(imgp);
464         else
465                 stack_base = exec_copyout_strings(imgp);
466 
467         /*
468          * If custom stack fixup routine present for this process
469          * let it do the stack setup.
470          * Else stuff argument count as first item on stack
471          */
472         if (p->p_sysent->sv_fixup != NULL)
473                 (*p->p_sysent->sv_fixup)(&stack_base, imgp);
474         else
475                 suword(--stack_base, imgp->args->argc);
The copyout function may cause vm_fault on line 463 or 465, which will try to lock the kernel_map->root->object.sub_map, which is locked by another process in vm_fault above. While at mean time the imgp->vp is still got locked.      

>How-To-Repeat:
using people.freebsd.org/~pho/stress/src/stress2.tgz, when running the 7 tests simulatanouly, the FreeBSD kernel will goto dead lock after three or more days .

>Fix:
Add  VOP_UNLOCK(imgp->vp, 0, td)
before 
462         if (p->p_sysent->sv_copyout_strings)
463                 stack_base = (*p->p_sysent->sv_copyout_strings)(imgp);
464         else
And add  vn_lock(imgp->vp, LK_EXCLUSIVE | LK_RETRY, td);
after
474         else
475                 suword(--stack_base, imgp->args->argc);
>Release-Note:
>Audit-Trail:

From: zhouyi zhou <zhouzhouyi@ercist.iscas.ac.cn>
To: bug-followup@FreeBSD.org
Cc: tegge@FreeBSD.org
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Fri, 6 Apr 2007 15:30:08 +0800

 Dear Mr. Egge
   Could you take a look at kern/111260, this case is similar to your modification
 of kern_exec.c in Revision 1.292 on May 2006.
 Thanks a lot
 Sincerely 
 ZHouyi Zhou

From: Kris Kennaway <kris@obsecurity.org>
To: Zhouyi Zhou <zhouyi04@ios.cn>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Fri, 6 Apr 2007 23:25:04 -0400

 On Thu, Apr 05, 2007 at 09:04:57AM +0000, Zhouyi Zhou wrote:
 
 > In the recently testing of FreeBSD using people.freebsd.org/~pho/stress/src/stress2.tgz, when running the 7 tests simulatanouly, the FreeBSD kernel will goto dead lock after three or more days .
 > 
 > The reason is as follows:
 > In function vm_fault at vm/vm_fault.c
 > 
 > 299         fs.vp = vnode_pager_lock(fs.first_object);
 > 300         KASSERT(fs.vp == NULL || !fs.map->system_map,
 > 
 > The kernel will try to get a lock on fs.vp with the fs.map still locks.
 > 
 > while in function do_execve at kern/kern_exec.c
 > 
 > 462         if (p->p_sysent->sv_copyout_strings)
 > 463                 stack_base = (*p->p_sysent->sv_copyout_strings)(imgp);
 > 464         else
 > 465                 stack_base = exec_copyout_strings(imgp);
 > 466 
 > 467         /*
 > 468          * If custom stack fixup routine present for this process
 > 469          * let it do the stack setup.
 > 470          * Else stuff argument count as first item on stack
 > 471          */
 > 472         if (p->p_sysent->sv_fixup != NULL)
 > 473                 (*p->p_sysent->sv_fixup)(&stack_base, imgp);
 > 474         else
 > 475                 suword(--stack_base, imgp->args->argc);
 > The copyout function may cause vm_fault on line 463 or 465, which will try to lock the kernel_map->root->object.sub_map, which is locked by another process in vm_fault above. While at mean time the imgp->vp is still got locked.      
 
 Can you please provide backtraces that lead you to this conclusion?
 
 Thanks,
 Kris

From: Kris Kennaway <kris@obsecurity.org>
To: Zhouyi Zhou <zhouzhouyi@ercist.iscas.ac.cn>
Cc: Kris Kennaway <kris@obsecurity.org>, freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Sat, 7 Apr 2007 04:08:02 -0400

 On Sat, Apr 07, 2007 at 03:36:40PM +0800, Zhouyi Zhou wrote:
 > Dear Mr Kennaway
 >     It is sure to goto dead lock with simulatanouly tests running after several days, I use FreeBSD's DEBUG_LOCKS options
 > with lk_stack to infer where the thead get the lock, and when the thread is not swapped out, I use 
 > ((struct i386_frame *)(struct thread * (0xc*****))->td_pcb->pcb_ebp)->f_frame->f_frame-> ...... ->f_retaddr
 > is infer where lead the thread into sleep.
 >    Besides all above, to find the reason that lead to dead lock, I modified 
 > sys/stack.h to:
 > 32 #define STACK_MAX       50      
 > 33 
 >  34 struct sbuf;
 >  35 
 >  36 struct stack {
 >  37         int             depth;
 >  38         vm_offset_t     pcs[STACK_MAX];
 > 39         vm_offset_t     arg0[STACK_MAX];
 > 40 };
 > and the function stack_save in file i386/i386/db_trace.c
 > to save the first argument beside the return address.
 >    And In the case of tracing the swapped out thread, I modified the thread struct  in sys/proc.h and msleep function 
 > in kern/kern_synch.c to save the calling stack when the thread is going to sleep:
 > 241 struct thread {
 > 242         struct proc     *td_proc;       /* (*) Associated process. */
 > 243         struct ksegrp   *td_ksegrp;
 > .....
 > 327         struct stack  td_stack
 > 328  }
 > 
 > 118 int
 > 119 msleep(ident, mtx, priority, wmesg, timo)
 > 120         void *ident;
 > 121         struct mtx *mtx;
 > 122         int priority, timo;
 > 123         const char *wmesg;
 > 124 {
 > 125         struct thread *td;
 > 126         struct proc *p;
 > 127         int catch, rval, flags;
 > 128         WITNESS_SAVE_DECL(mtx);
 > 129 
 > 130         td = curthread;
 > 131         stack_save(td->td_stack);
 >      It is absolutely evidence that it is the 
 >  462 if (p->p_sysent->sv_copyout_strings)
 >  463 stack_base = (*p->p_sysent->sv_copyout_strings)(imgp); 
 > in do_execve that lead to dead lock.
 
 These are your conclusions, I am asking for the stack traces that lead
 you to them so that we can verify your observations.
 
 Kris

From: "Zhouyi Zhou" <zhouzhouyi@ercist.iscas.ac.cn>
To: "Kris Kennaway" <kris@obsecurity.org>
Cc: <freebsd-gnats-submit@FreeBSD.org>
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Sat, 7 Apr 2007 15:36:40 +0800

 RGVhciBNciBLZW5uYXdheQ0KICAgIEl0IGlzIHN1cmUgdG8gZ290byBkZWFkIGxvY2sgd2l0aCBz
 aW11bGF0YW5vdWx5IHRlc3RzIHJ1bm5pbmcgYWZ0ZXIgc2V2ZXJhbCBkYXlzLCBJIHVzZSBGcmVl
 QlNEJ3MgREVCVUdfTE9DS1Mgb3B0aW9ucw0Kd2l0aCBsa19zdGFjayB0byBpbmZlciB3aGVyZSB0
 aGUgdGhlYWQgZ2V0IHRoZSBsb2NrLCBhbmQgd2hlbiB0aGUgdGhyZWFkIGlzIG5vdCBzd2FwcGVk
 IG91dCwgSSB1c2UgDQooKHN0cnVjdCBpMzg2X2ZyYW1lICopKHN0cnVjdCB0aHJlYWQgKiAoMHhj
 KioqKiopKS0+dGRfcGNiLT5wY2JfZWJwKS0+Zl9mcmFtZS0+Zl9mcmFtZS0+IC4uLi4uLiAtPmZf
 cmV0YWRkcg0KaXMgaW5mZXIgd2hlcmUgbGVhZCB0aGUgdGhyZWFkIGludG8gc2xlZXAuDQogICBC
 ZXNpZGVzIGFsbCBhYm92ZSwgdG8gZmluZCB0aGUgcmVhc29uIHRoYXQgbGVhZCB0byBkZWFkIGxv
 Y2ssIEkgbW9kaWZpZWQgDQpzeXMvc3RhY2suaCB0bzoNCjMyICNkZWZpbmUgU1RBQ0tfTUFYICAg
 ICAgIDUwICAgICAgDQozMyANCiAzNCBzdHJ1Y3Qgc2J1ZjsNCiAzNSANCiAzNiBzdHJ1Y3Qgc3Rh
 Y2sgew0KIDM3ICAgICAgICAgaW50ICAgICAgICAgICAgIGRlcHRoOw0KIDM4ICAgICAgICAgdm1f
 b2Zmc2V0X3QgICAgIHBjc1tTVEFDS19NQVhdOw0KMzkgICAgICAgICB2bV9vZmZzZXRfdCAgICAg
 YXJnMFtTVEFDS19NQVhdOw0KNDAgfTsNCmFuZCB0aGUgZnVuY3Rpb24gc3RhY2tfc2F2ZSBpbiBm
 aWxlIGkzODYvaTM4Ni9kYl90cmFjZS5jDQp0byBzYXZlIHRoZSBmaXJzdCBhcmd1bWVudCBiZXNp
 ZGUgdGhlIHJldHVybiBhZGRyZXNzLg0KICAgQW5kIEluIHRoZSBjYXNlIG9mIHRyYWNpbmcgdGhl
 IHN3YXBwZWQgb3V0IHRocmVhZCwgSSBtb2RpZmllZCB0aGUgdGhyZWFkIHN0cnVjdCAgaW4gc3lz
 L3Byb2MuaCBhbmQgbXNsZWVwIGZ1bmN0aW9uIA0KaW4ga2Vybi9rZXJuX3N5bmNoLmMgdG8gc2F2
 ZSB0aGUgY2FsbGluZyBzdGFjayB3aGVuIHRoZSB0aHJlYWQgaXMgZ29pbmcgdG8gc2xlZXA6DQoy
 NDEgc3RydWN0IHRocmVhZCB7DQoyNDIgICAgICAgICBzdHJ1Y3QgcHJvYyAgICAgKnRkX3Byb2M7
 ICAgICAgIC8qICgqKSBBc3NvY2lhdGVkIHByb2Nlc3MuICovDQoyNDMgICAgICAgICBzdHJ1Y3Qg
 a3NlZ3JwICAgKnRkX2tzZWdycDsNCi4uLi4uDQozMjcgICAgICAgICBzdHJ1Y3Qgc3RhY2sgIHRk
 X3N0YWNrDQozMjggIH0NCg0KMTE4IGludA0KMTE5IG1zbGVlcChpZGVudCwgbXR4LCBwcmlvcml0
 eSwgd21lc2csIHRpbW8pDQoxMjAgICAgICAgICB2b2lkICppZGVudDsNCjEyMSAgICAgICAgIHN0
 cnVjdCBtdHggKm10eDsNCjEyMiAgICAgICAgIGludCBwcmlvcml0eSwgdGltbzsNCjEyMyAgICAg
 ICAgIGNvbnN0IGNoYXIgKndtZXNnOw0KMTI0IHsNCjEyNSAgICAgICAgIHN0cnVjdCB0aHJlYWQg
 KnRkOw0KMTI2ICAgICAgICAgc3RydWN0IHByb2MgKnA7DQoxMjcgICAgICAgICBpbnQgY2F0Y2gs
 IHJ2YWwsIGZsYWdzOw0KMTI4ICAgICAgICAgV0lUTkVTU19TQVZFX0RFQ0wobXR4KTsNCjEyOSAN
 CjEzMCAgICAgICAgIHRkID0gY3VydGhyZWFkOw0KMTMxICAgICAgICAgc3RhY2tfc2F2ZSh0ZC0+
 dGRfc3RhY2spOw0KICAgICBJdCBpcyBhYnNvbHV0ZWx5IGV2aWRlbmNlIHRoYXQgaXQgaXMgdGhl
 IA0KIDQ2MiBpZiAocC0+cF9zeXNlbnQtPnN2X2NvcHlvdXRfc3RyaW5ncykNCiA0NjMgc3RhY2tf
 YmFzZSA9ICgqcC0+cF9zeXNlbnQtPnN2X2NvcHlvdXRfc3RyaW5ncykoaW1ncCk7IA0KaW4gZG9f
 ZXhlY3ZlIHRoYXQgbGVhZCB0byBkZWFkIGxvY2suDQoNClRoYW5rIHlvdSB2ZXJ5IG11Y2gNClNp
 bmNlcmVseSB5b3Vycw0KWmhvdXlpIFpob3UNCg0KLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0t
 LSANCkZyb206ICJLcmlzIEtlbm5hd2F5IiA8a3Jpc0BvYnNlY3VyaXR5Lm9yZz4NClRvOiAiWmhv
 dXlpIFpob3UiIDx6aG91eWkwNEBpb3MuY24+DQpDYzogPGZyZWVic2QtZ25hdHMtc3VibWl0QEZy
 ZWVCU0Qub3JnPg0KU2VudDogU2F0dXJkYXksIEFwcmlsIDA3LCAyMDA3IDExOjI1IEFNDQpTdWJq
 ZWN0OiBSZToga2Vybi8xMTEyNjA6IEZyZWVCU0Qga2VybmVsIGRlYWQgbG9jayBhbmQgYSBzb2x1
 dGlvbg0KPiBDYW4geW91IHBsZWFzZSBwcm92aWRlIGJhY2t0cmFjZXMgdGhhdCBsZWFkIHlvdSB0
 byB0aGlzIGNvbmNsdXNpb24/DQo+IA0KPiBUaGFua3MsDQo+IEtyaXMNCj4=
 

From: "Zhouyi Zhou" <zhouzhouyi@ercist.iscas.ac.cn>
To: <freebsd-gnats-submit@FreeBSD.org>
Cc:  
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Sat, 7 Apr 2007 16:14:45 +0800

 Dear Kennaway,
       My machine has been restarted, I will run it again tomorrow, and
 will give you the backtrace after 3 or 4 days.
       Thanks alot
 Sincerely yours
 Zhouyi Zhou
 

From: zhouyi zhou <zhouzhouyi@ercist.iscas.ac.cn>
To: Kris Kennaway <kris@obsecurity.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Thu, 25 Oct 2007 10:29:22 +0800

 Hi Kris,
   It appears again, see the photo I take, if you want more, I can paste more :-)
 thanks
 
 http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock1.jpg
 http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock2.jpg
 http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock3.jpg
 
 Best Regards
 Zhouyi Zhou
 On Sat, 7 Apr 2007 04:08:02 -0400
 Kris Kennaway <kris@obsecurity.org> wrote:
 
 >
 > These are your conclusions, I am asking for the stack traces that lead
 > you to them so that we can verify your observations.
 > 
 > Kris
 > 

From: zhouyi zhou <zhouzhouyi@ercist.iscas.ac.cn>
To: kris@obsecurity.org,linimon@freebsd.org
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/111260: FreeBSD kernel dead lock and a solution
Date: Thu, 25 Oct 2007 13:35:52 +0800

 by usinggdb, I find the slow_copyout in lock1.jpg is caused by 
 do_execve's call to 
 	if (p->p_sysent->sv_copyout_strings)
 		stack_base = (*p->p_sysent->sv_copyout_strings)(imgp);
 	else
 		stack_base = exec_copyout_strings(imgp);
 
 the slow_copyout in lock2.jpg is caused by 
 exec_elf32_imgact's call to 
 
 			if ((error = __elfN(load_section)(imgp->proc, vmspace,
 			    imgp->vp, imgp->object, phdr[i].p_offset,
 			    (caddr_t)((uintptr_t)phdr[i].p_vaddr + base_addr),
 			    phdr[i].p_memsz, phdr[i].p_filesz, prot,
 			    sv->sv_pagesize)) != 0)
   				goto fail;
 
 Sinerely 
 Zhouyi
 
 On Thu, 25 Oct 2007 10:29:22 +0800
 zhouyi zhou <zhouzhouyi@ercist.iscas.ac.cn> wrote:
 
 > Hi Kris,
 >   It appears again, see the photo I take, if you want more, I can paste more :-)
 > thanks
 > 
 > http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock1.jpg
 > http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock2.jpg
 > http://wiki.freebsd.org/ZhouyiZHOU?action=AttachFile&do=get&target=lock3.jpg
 > 
 > Best Regards
 > Zhouyi Zhou
Responsible-Changed-From-To: freebsd-bugs->csjp 
Responsible-Changed-By: csjp 
Responsible-Changed-When: Tue Aug 12 15:44:32 UTC 2008 
Responsible-Changed-Why:  
I can work with the submitter to try and fix this. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=111260 
State-Changed-From-To: open->patched 
State-Changed-By: csjp 
State-Changed-When: Tue Aug 12 21:29:28 UTC 2008 
State-Changed-Why:  
This has been fixed in HEAD.  Once it's clear this isnt going to 
cause any problems, I will MFC to -STABLE. 

Thank you for tracking this down! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=111260 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/111260: commit references a PR
Date: Tue, 12 Aug 2008 21:28:07 +0000 (UTC)

 csjp        2008-08-12 21:27:48 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/kern             kern_exec.c 
   Log:
   SVN rev 181647 on 2008-08-12 21:27:48Z by csjp
   
   Reduce the scope of the vnode lock such that it does not cover
   the various copyouts associated with initializing the process's
   argv/env data in userspace.  It is possible that these copyout
   operations can fault under memory pressure, possibly resulting
   in dead locks.  This is believed to be safe since none of the
   copyout_strings() operations need to interact with the vnode here.
   
   Submitted by:   Zhouyi Zhou
   PR:             kern/111260
   Discussed with: kib
   MFC after:      3 weeks
   
   Revision  Changes    Path
   1.321     +5 -1      src/sys/kern/kern_exec.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
>Unformatted:
