From dick@ns.tar.com Mon Mar 15 06:37:24 1999
Return-Path: <dick@ns.tar.com>
Received: from ns.tar.com (ns.tar.com [204.95.187.2])
	by hub.freebsd.org (Postfix) with ESMTP id 3045214D7D
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 15 Mar 1999 06:35:58 -0800 (PST)
	(envelope-from dick@ns.tar.com)
Received: (from dick@localhost)
	by ns.tar.com (8.9.3/8.9.3) id IAA39811;
	Mon, 15 Mar 1999 08:35:39 -0600 (CST)
	(envelope-from dick)
Message-Id: <199903151435.IAA39811@ns.tar.com>
Date: Mon, 15 Mar 1999 08:35:39 -0600 (CST)
From: dick@tar.com
Sender: dick@ns.tar.com
Reply-To: dick@tar.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: [PATCH included]malloc/free breaks in certain threaded cases
X-Send-Pr-Version: 3.2

>Number:         10599
>Category:       misc
>Synopsis:       [PATCH included]malloc/free breaks in certain threaded cases
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 15 06:40:01 PST 1999
>Closed-Date:    Fri Jul 20 14:44:39 PDT 2001
>Last-Modified:  Fri Jul 20 14:45:55 PDT 2001
>Originator:     Richard Seaman Jr.
>Release:        FreeBSD 4.0-CURRENT i386
>Organization:
>Environment:

	
While I believe the problem could theoretically occur at any time, it
has only been exposed running SMP kernel threads (linux threads port,
see http://lt.tar.com) using Luoqi Chen's VMSpace SMP patches
(see http://www.freebsd.org/~luoqi).  Three example cases can then
bee seen when running the ACE threads tests with optimization enabled
(see http://www.pinyon.orv/ace).  I have had another report of the
problem, but apparently involving a private app I don't have access to.

>Description:

	
Malloc/free in libc serialize access in threaded cases by using the
THREAD_LOCK/THREAD_UNLOCK macros.  These are in turn defined in terms
of the _SPINLOCK/_SPINUNLOCK macros in src/lib/libc/include/spinlock.h.
_SPINLOCK is implemented by _spinlock() that the threaded library or
application must supply to override the weak alias definition of
_spinlock contained in _spinlock_stub.c.  However, _SPINUNLOCK is
implemented directly, independently of the implementation of
_spinlock().

The implementtaion of _SPINUNLOCK zero's out the access_lock field of
the spinlock_t structure, but not any of the other fields.  If an
implementation of _spinlock() uses the other fields of the spinlock_t
structure (eg. lock_owner, and see libc_r's uthread_spinlock.c as
an example), and depending on when/how they are used, a race condition
can develop where _spinlock is using old values of the other fields
of the spinlock_t structure.  In the specific problem case, the
lock_owner field is used to detect recursive calls into _spinlock().
By using old values recursion is incorrectly identified.

To my knowledge, this problem has not surfaced in libc_r, in part
because the _spinlock() implementation in uthread_spinlock.c does
a (wasteful) call to sched_yield before accessing any of the
other elements of the spinlock_t field.  This improves the liklihood
that the race condition will never surface. Also, the fact that
libc_r does not use multiple processors for threads greatly reduces
the liklihood of exposing this problem.
 

>How-To-Repeat:

	
Compile the ACE tests noted above on a fastish (PII 350 or better) SMP
machines and run the tests.  Compile the ACE libraries and tests with
optimization on.

>Fix:
	
One simple fix would be to change the definition of _SPINUNLOCK to
zero out the other elements (or at least the lock_owner element) of
the spinlock_t. 

Another solution is to make sure the _spinlock() implementation does
not access the other elements of spinlock_t unless it knows they are
in a good state.  However, this would make it impossible to detect
recursive calls into spinlock().  In this case, a recursive call into
malloc/free would result in a deadlock, instead if the current behaviour,
which is to print a warning about the recursive call and to abort.
AFAIK, recursive calls to malloc/free can only result from a programming
error in which malloc/free are incorrectly called in a signal handler.

However, it seems to me the better solution is to require that the
threaded library/app that implements _spinlock() should also implement
a _spinlunlock() that matches.  The patch below takes this approach.

Also note that to properly implement a "recursive" spinlock, which is
(sort of) what uthread_spinlock.c does, there should really be a 
count element added to the spinlock_t structure.  Otherwise the
spinunlock function won't know when to really release the lock.
I believe the current libc_r uthread_spinlock.c implementation
is still slightly buggy in this respect.

Index: gen/_spinlock_stub.c
===================================================================
RCS file: /home/ncvs/src/lib/libc/gen/_spinlock_stub.c,v
retrieving revision 1.3
diff -c -r1.3 _spinlock_stub.c
*** _spinlock_stub.c	1998/06/09 08:32:22	1.3
--- _spinlock_stub.c	1999/03/15 13:39:48
***************
*** 40,49 ****
  #include "spinlock.h"
  
  /*
!  * Declare weak references in case the application is not linked
!  * with libpthread.
   */
  #pragma weak _spinlock=_spinlock_stub
  #pragma weak _spinlock_debug=_spinlock_debug_stub
  
  /*
--- 40,52 ----
  #include "spinlock.h"
  
  /*
!  * Declare weak references in case the application is not linked 
!  * with libpthread, and in the additional case of _spinunlock, in
!  * the event that libpthread is an older version that does not implement 
!  * _spinunlock.
   */
  #pragma weak _spinlock=_spinlock_stub
+ #pragma weak _spinunlock=_spinunlock_stub
  #pragma weak _spinlock_debug=_spinlock_debug_stub
  
  /*
***************
*** 61,64 ****
--- 64,88 ----
  _spinlock_debug_stub(spinlock_t *lck, char *fname, int lineno)
  {
  }
+ 
+ /*
+  * This function is a stub for the spinunlock function in libpthread.
+  */
+ void
+ _spinunlock_stub(spinlock_t *lck)
+ {
+ 	/*
+ 	 * We actually do something here since there wasn't a
+ 	 * _spinunlock function until recently.  This will help
+ 	 * retain compatibility with an older app that implements
+ 	 * _spinlock, but not _spinunlock.  This implements the
+ 	 * EXACT behaviour of the old _SPINUNLOCK macro, even
+ 	 * though for some applications this might be buggy.
+ 	 * In general, if the lock_owner element is used, this
+ 	 * line should be added before the access_lock is zeroed:
+ 	 *	lck->lock_owner = 0;
+ 	 */
+ 	lck->access_lock = 0;
+ }
+ 
  #endif
Index: include/spinlock.h
===================================================================
RCS file: /home/ncvs/src/lib/libc/include/spinlock.h,v
retrieving revision 1.3
diff -c -r1.3 spinlock.h
*** spinlock.h	1998/06/09 08:28:49	1.3
--- spinlock.h	1999/03/13 19:59:07
***************
*** 52,58 ****
  
  #define	_SPINLOCK_INITIALIZER	{ 0, 0, 0, 0 }
  
! #define _SPINUNLOCK(_lck)	(_lck)->access_lock = 0
  #ifdef	_LOCK_DEBUG
  #define	_SPINLOCK(_lck)		_spinlock_debug(_lck, __FILE__, __LINE__)
  #else
--- 52,58 ----
  
  #define	_SPINLOCK_INITIALIZER	{ 0, 0, 0, 0 }
  
! #define _SPINUNLOCK(_lck)	_spinunlock(_lck)
  #ifdef	_LOCK_DEBUG
  #define	_SPINLOCK(_lck)		_spinlock_debug(_lck, __FILE__, __LINE__)
  #else
***************
*** 65,70 ****
--- 65,71 ----
  __BEGIN_DECLS
  long	_atomic_lock __P((volatile long *));
  void	_spinlock __P((spinlock_t *));
+ void	_spinunlock __P((spinlock_t *));
  void	_spinlock_debug __P((spinlock_t *, char *, int));
  __END_DECLS

Index: uthread/uthread_spinlock.c
===================================================================
RCS file: /home/ncvs/src/lib/libc_r/uthread/uthread_spinlock.c,v
retrieving revision 1.4
diff -c -r1.4 uthread_spinlock.c
*** uthread_spinlock.c	1998/06/09 23:13:10	1.4
--- uthread_spinlock.c	1999/03/13 20:04:16
***************
*** 68,73 ****
--- 68,81 ----
  	lck->lock_owner = (long) _thread_run;
  }
  
+ void
+ _spinunlock(spinlock_t *lck)
+ {
+ 
+ 	lck->lock_owner = 0;
+ 	lck->access_lock = 0;
+ }
+ 
  /*
   * Lock a location for the running thread. Yield to allow other
   * threads to run if this thread is blocked because the lock is
  


 
	


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: gnats-admin->freebsd-bugs 
Responsible-Changed-By: steve 
Responsible-Changed-When: Sun Mar 21 17:02:28 PST 1999 
Responsible-Changed-Why:  
Misfiled PR. 
State-Changed-From-To: open->feedback 
State-Changed-By: mike 
State-Changed-When: Thu Jul 19 18:01:49 PDT 2001 
State-Changed-Why:  

Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10599 
State-Changed-From-To: feedback->closed 
State-Changed-By: mike 
State-Changed-When: Fri Jul 20 14:44:39 PDT 2001 
State-Changed-Why:  

At the originator's request, I'm am closing this PR and labeling it 
obsolete. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10599 
>Unformatted:
