From dmlb@dmlb.org  Thu Sep  5 13:41:08 2002
Return-Path: <dmlb@dmlb.org>
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 282FA37B400
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  5 Sep 2002 13:41:08 -0700 (PDT)
Received: from dmlb.org (pc1-cmbg2-6-cust106.cam.cable.ntl.com [80.4.4.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A625643E42
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  5 Sep 2002 13:41:07 -0700 (PDT)
	(envelope-from dmlb@dmlb.org)
Received: from slave.my.domain ([192.168.200.39])
	by dmlb.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 3.36 #1)
	id 17n3R3-000DVr-00; Thu, 05 Sep 2002 21:41:05 +0100
Received: from dmlb by slave.my.domain with local (Exim 3.36 #1)
	id 17n3R3-0000Ms-00; Thu, 05 Sep 2002 21:41:05 +0100
Message-Id: <E17n3R3-0000Ms-00@slave.my.domain>
Date: Thu, 05 Sep 2002 21:41:05 +0100
From: Duncan Barclay <dmlb@dmlb.org>
Reply-To: Duncan Barclay <dmlb@dmlb.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc: marcel@xclint.net, dmlb@dmlb.org
Subject: Hack to allow Linux Matlab to exit
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         42457
>Category:       kern
>Synopsis:       Hack to allow Linux Matlab to exit
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 05 13:50:01 PDT 2002
>Closed-Date:    Wed Mar 03 01:34:28 PST 2004
>Last-Modified:  Wed Mar 03 01:34:28 PST 2004
>Originator:     Duncan Barclay
>Release:        FreeBSD 4.6-PRERELEASE i386
>Organization:
>Environment:
System: FreeBSD slave.my.domain 4.6-PRERELEASE FreeBSD 4.6-PRERELEASE #2: Thu Sep 5 21:11:18 BST 2002 dmlb@slave.my.domain:/usr/src-CVSup/sys/compile/SLAVE i386
>Description:
Linux Matlab version 6 and 6.1 and possibly 6.5, are known to hang
on exit when the matlab Java VM is used. A kill -9 is required.

Matlab when using its JVM creates a number of threads:
 matlab
   matlab thread #1
     matlab thread #1.1
     matlab thread #1.2
     matlab thread #1.3

On exit, threads #1.1, #1.2 and #1.3 die gracefully and are reaped by
thread #1. However, thread #1 is not reaped correctly with matlab
apparently issuing a
	linux_wait4(-1, &foo, 0 0).
This does not reap threads but processes.

Thread #1 is created with
	linux_clone(0xf00, *bar())
The options mask specifies a thread that does not want to send its
parent a signal when it dies.

From linux clone(2):
        The low byte of flags contains the number of the signal sent
        to the parent when the child dies. If this signal is specified
        as anything other than SIGCHLD , then the parent process must
        specify the __WALL or __WCLONE options when waiting for the
        child with wait (2). If no signal is specified, then the
        parent process is not signaled when the child terminates.
[note last sentance]

FreeBSD always sends a signal to the parent when terminating
a process, from /sys/kern_exit.c:exit1()

        if (p->p_sigparent && p->p_pptr != initproc) {
                psignal(p->p_pptr, p->p_sigparent);
        } else {
                psignal(p->p_pptr, SIGCHLD);
        }

FreeBSD therefore sends matlab a SIGCHLD. Matlab has a SIGCHLD handler
that issues the above wait4. This is shown in the following ktrace
output with matlab pid = 6255, and thread #1 pid = 6304.

  6304 matlab   CALL  linux_kill(0x186f,0x20)
  6255 matlab   PSIG  SIG(null) caught handler=0x28c96e10 mask=0x80000000 code=0x0
  6304 matlab   RET   linux_kill 0
  6304 matlab   CALL  exit(0)
  6255 matlab   RET   linux_rt_sigsuspend -1 errno 4 Interrupted system call
  6255 matlab   PSIG  SIGCHLD caught handler=0x28c97460 mask=0x80000000 code=0x0
  6255 matlab   CALL  linux_wait4(0xffffffff,0xbfbfa1b0,0,0)
 
If the above code in kern_exit.c is replaced with

        if (p->p_sigparent && p->p_pptr != initproc) {
                psignal(p->p_pptr, p->p_sigparent);
        } else if (p->p_sigparent != 0) {
                psignal(p->p_pptr, SIGCHLD);
        }

to not send a SIGCHLD, then matlab reaps the thread. ktrace output
with matlab pid = 808, and thread #1 pid = 857.

   857 matlab   CALL  linux_kill(0x328,0x20)
   808 matlab   PSIG  SIG(null) caught handler=0x28c96e10 mask=0x80000000 code=0x0
   857 matlab   RET   linux_kill 0
   857 matlab   CALL  exit(0)
   808 matlab   RET   linux_rt_sigsuspend -1 errno 4 Interrupted system call
   808 matlab   CALL  linux_sigreturn(0xbfbfa928)
   808 matlab   RET   linux_sigreturn JUSTRETURN
   808 matlab   CALL  linux_wait4(0x359,0,0x80000000,0)
   808 matlab   RET   linux_wait4 857/0x359
   808 matlab   CALL  munmap(0x2d75d000,0x1000)
   808 matlab   RET   munmap 0
   808 matlab   CALL  exit(0)
  

>How-To-Repeat:
	run matlab and type "exit" at the prompt
>Fix:

Snippet of code above is suggested as a change to kern_exit.c,
but is probably dangerous as it stands as it changes exit
signalling behaviour.

Maintainers of kern_exit.c and the linuxulator are requested to
implement a more robust solution.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: truckman 
State-Changed-When: Wed Mar 3 01:30:52 PST 2004 
State-Changed-Why:  
Fix comitted to HEAD in kern_exit.c rev 1.222. 

Fix committed to RELENG_4 in kern_exit.c rev 1.92.2.13. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=42457 
>Unformatted:
