From nobody@FreeBSD.org  Thu Jul  5 16:53:33 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 5978937B40A
	for <freebsd-gnats-submit@FreeBSD.org>; Thu,  5 Jul 2001 16:53:33 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.3/8.11.3) id f65NrX458621;
	Thu, 5 Jul 2001 16:53:33 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200107052353.f65NrX458621@freefall.freebsd.org>
Date: Thu, 5 Jul 2001 16:53:33 -0700 (PDT)
From: Nathan Mower <nmower@verio.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Race condition in run-time linker
X-Send-Pr-Version: www-1.0

>Number:         28746
>Category:       i386
>Synopsis:       Race condition in run-time linker
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    jdp
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jul 05 17:00:03 PDT 2001
>Closed-Date:    Mon Jul 9 10:32:58 PDT 2001
>Last-Modified:  Mon Jul 09 10:33:57 PDT 2001
>Originator:     Nathan Mower
>Release:        4.2
>Organization:
Verio
>Environment:
FreeBSD ft.iserver.com 4.2-RELEASE FreeBSD 4.2-RELEASE #2a: Thu Jun 28 11:27:27 MDT 2001 root@fc:/usr/src/sys/compile/VKERN  i386

>Description:
There seems to be a race condition in the run-time linker (ELF).  As
near as I can tell, the situation is this: _rtld_bind calls
rlock_acquire(), but before it gets to rlock_release(), a signal is
caught.  The signal handler calls exit(), so the __atexit list is
traversed, calling rtld_exit(), which calls wlock_acquire().  This
spins on the lock, which it never gets.  The process is hung.
>How-To-Repeat:
Heavy traffic on Apache web server (I use torture.pl).  Frequently send
SIGUSR1 to child Apache processes.  This is a very intermittent bug,
as you can well imagine.
>Fix:
Known work-around: run Apache with LD_BIND_NOW turned on.  I dunno --
might have to block signals between rlock_acquire() and rlock_release()
in _rtld_bind().
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->jdp 
Responsible-Changed-By: dd 
Responsible-Changed-When: Fri Jul 6 06:05:04 PDT 2001 
Responsible-Changed-Why:  
jdp seems to make most of the changes to rtld. 
jdp: this isn't one of the best bug reports in the world, but perhaps 
it'll alert you to a possible problem. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=28746 

From: John Polstra <jdp@polstra.com>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: i386/28746: Race condition in run-time linker
Date: Fri, 06 Jul 2001 17:51:28 -0700 (PDT)

 Actually, I think this is a good bug report.  It's very concise, but
 the submitter's analysis of the problem is stated clearly, and I
 believe it's 100% correct.  This kind of stuff is not easy to debug,
 so he must have done quite a bit of work to diagnose the problem.
 (Thank you Nathan!)
 
 I'll have to think about the best way to fix it.  I want to avoid
 blocking/unblocking signals in rlock_acquire/rlock_release if
 possible, because of the cost of the system calls.  I have a couple
 other ideas, but they're not fleshed out yet.  Stay tuned.
State-Changed-From-To: open->feedback 
State-Changed-By: jdp 
State-Changed-When: Sun Jul 8 15:56:45 PDT 2001 
State-Changed-Why:  
I think the submitter's analysis of this problem is exactly right. 
However, after looking into it some more I am inclined to close 
this PR on the grounds that the bug is in apache rather than in 
FreeBSD.  According to the POSIX standard, a signal handler is 
allowed to call _exit() but not exit().  If apache's signal handler 
called _exit() as it ought to do, the atexit() processing would be 
bypassed, the dynamic linker's termination function would not be 
called, and this problem would not appear. 

If I could see a reasonable way to fix this in the dynamic linker 
without killing performance, I'd gladly fix it.  But barring that, 
I think I'm going to have to point to POSIX and say it's not our 
bug. 

I'm putting the PR into the feedback state first, to give the 
submitter an opportunity to disagree. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=28746 

From: Nathan Mower <nmower@verio.net>
To: freebsd-gnats-submit@FreeBSD.org, nmower@verio.net
Cc:  
Subject: Re: i386/28746: Race condition in run-time linker
Date: Mon, 09 Jul 2001 11:24:27 -0600

 No disagreement here, John.  I'll submit a bug report to Apache.org.
 Thanks for taking a look at it.
State-Changed-From-To: feedback->closed 
State-Changed-By: jdp 
State-Changed-When: Mon Jul 9 10:32:58 PDT 2001 
State-Changed-Why:  
Submitter says he doesn't object to closing this PR, since the 
actual bug is in apache.  He will send a bug report to the apache 
team. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=28746 
>Unformatted:
