From nobody@FreeBSD.org  Wed Feb 11 21:43:15 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9022C106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 11 Feb 2009 21:43:15 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 638118FC13
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 11 Feb 2009 21:43:15 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n1BLhE8j083947
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 11 Feb 2009 21:43:14 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n1BLhEtL083946;
	Wed, 11 Feb 2009 21:43:14 GMT
	(envelope-from nobody)
Message-Id: <200902112143.n1BLhEtL083946@www.freebsd.org>
Date: Wed, 11 Feb 2009 21:43:14 GMT
From: Guillaume Morin <guillaume@morinfr.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: c++ exceptions very slow on FreeBSD 7.1/amd64
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         131597
>Category:       kern
>Synopsis:       [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Feb 11 21:50:03 UTC 2009
>Closed-Date:    
>Last-Modified:  Mon Jul  1 00:30:00 UTC 2013
>Originator:     Guillaume Morin
>Release:        7.1-RELEASE
>Organization:
>Environment:
FreeBSD freebsd 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Thu Jan  1 08:58:24 UTC 2009     root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
I have a very simple C++ program that simply throws 100,000 exceptions.  Compiled on my Core 2 Duo running FreeBSD, it takes 4 secs to run.  On my linux box running a 4 year old Athlon 64, it takes 0.4 secs.  It looks like the FreeBSD implementation makes a *lot* of syscalls.  

We found this problem while running test code for our libraries which is very exception heavy.

Here is the program:
$cat testexcept.cpp
int main(void) {
    int i = 0;
    while(1) {
        ++i;
        try {
            if(i == 100000) {
                break;
            }
            throw 0;
        }
        catch(...) {
        }
    }

    return 0;
}
$g++ -v
Using built-in specs.
Target: amd64-undermydesk-freebsd
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 4.2.1 20070719  [FreeBSD]
$g++  -o t testexcept.cpp
$time ./t

real    0m4.436s
user    0m4.292s
sys     0m0.144s
$truss -oout ./t
$wc -l out
 1000072 out
$grep sigprocmask out | sort | uniq -c
499999 sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
499999 sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)



Same program on the linux box
=============================

linux $g++-4.2 -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Debian 4.2.4-6)
linux $g++-4.2 -m64 -o t testexcept.cpp
linux $time ./t

real    0m0.421s
user    0m0.404s
sys     0m0.000s
linux $strace -oout ./t
linux $wc -l out
54 out
linux $


Both machines have a similar frequency (around 2Ghz) but the Core 2 Duo should be faster.  Both boxes were very lightly loaded 
>How-To-Repeat:
Compile and run the program :)
>Fix:


>Release-Note:
>Audit-Trail:

From: Mikolaj Golub <to.my.trociny@gmail.com>
To: bug-followup@FreeBSD.org
Cc: Guillaume Morin <guillaume@morinfr.org>
Subject: Re: misc/131597: c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Thu, 12 Feb 2009 12:25:18 +0200

 It looks like 6.x is not affected by this problem (or at least the situation
 is much better here). I have checked on several 6.x hosts and the output looks
 like this:
 
         0.55 real         0.55 user         0.00 sys
       1000  maximum resident set size
          4  average shared memory size
         33  average unshared data size
        133  average unshared stack size
         98  page reclaims
          0  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          1  voluntary context switches
         48  involuntary context switches
 
 In all cases sys time was zero. While on my 7.1-STABLE i386 (Core(TM)2 CPU 6400 @ 2.13GHz) box :
 
         1.93 real         0.58 user         1.34 sys
       1252  maximum resident set size
          4  average shared memory size
        732  average unshared data size
        128  average unshared stack size
        115  page reclaims
          0  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          1  voluntary context switches
         32  involuntary context switches
 
 or 7.1-RELEASE-p1 amd64 (Xeon(R) CPU E5450 @ 3.00GHz):
 
         3.93 real         3.80 user         0.12 sys
       1736  maximum resident set size
          4  average shared memory size
       1024  average unshared data size
        128  average unshared stack size
        135  page reclaims
          0  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          1  voluntary context switches
         41  involuntary context switches
 
 I have tried g++ both of v3.4 and v4.2 and hasn't noticed significant
 difference, in my tests 3.4 was a bit faster but it might be influence of
 external factors (host load).
 
 -- 
 Mikolaj Golub

From: John Baldwin <jhb@freebsd.org>
To: bug-followup@freebsd.org,
 guillaume@morinfr.org
Cc: kib@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Thu, 22 Apr 2010 16:09:34 -0400

 I tracked the sigprocmask() system calls down to the operations to acquire a 
 write lock in the runtime linker.  The lock was added to fix an earlier bug 
 with throwing exceptions in multithreaded C++ apps.  The relevant commit that 
 added the lock is this:
 
    http://svn.freebsd.org/viewvc/base?view=revision&revision=178807
 
 Are exceptions permitted during a signal handler?  If not, then in theory we 
 would not need to invoke sigprocmask() for this particular lock perhaps?  I'm 
 not sure how easy that would be to achieve given the hooks to allow the thread 
 library to overload the locking routines.  Also, this doesn't explain the lack 
 of sigprocmask() calls under i386.  FreeBSD/i386 should be using the same 
 locking code and thus invoking sigprocmask() for each exception as well.
 
 -- 
 John Baldwin

From: Kostik Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, kan@freebsd.org,
        davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 23 Apr 2010 15:25:01 +0300

 --kG2acDqmwoBDcCHP
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > I tracked the sigprocmask() system calls down to the operations to
 > acquire a write lock in the runtime linker. The lock was added to fix
 > an earlier bug with throwing exceptions in multithreaded C++ apps. The
 > relevant commit that added the lock is this:
 >
 >    http://svn.freebsd.org/viewvc/base?view=3Drevision&revision=3D178807
 >
 > Are exceptions permitted during a signal handler? If not, then in
 > theory we would not need to invoke sigprocmask() for this particular
 > lock perhaps? I'm not sure how easy that would be to achieve given the
 > hooks to allow the thread library to overload the locking routines.
 > Also, this doesn't explain the lack of sigprocmask() calls under i386.
 > FreeBSD/i386 should be using the same locking code and thus invoking
 > sigprocmask() for each exception as well.
 
 Throwing an exception during asyncronous signal execution rises undefined
 behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 be removed as far as we need to provide FreeBSD 4.x compatibility.
 
 What can be done is to provide completely dummy implementation of rtld
 locks by the modern libc. Fortunately, libthr only injects its rtld
 locks implementation into rtld on first thread creation. The simple
 stack of RtldLockInfo seems to give us proper restoration to the libc
 provided locks instead of default locks when process is back to
 single-thread.
 
 The prototype is below. It does not work for static linking, and this is
 the first usage of __attribute__((constructor)), at least in libc.
 Alexander, I do remember about -DDEBUG in rtld-elf/Makefile.
 
 diff --git a/lib/libc/Makefile b/lib/libc/Makefile
 index b58b6cb..be41c1c 100644
 --- a/lib/libc/Makefile
 +++ b/lib/libc/Makefile
 @@ -16,6 +16,8 @@ SHLIB_MAJOR=3D 7
  WARNS?=3D	2
  CFLAGS+=3D-I${.CURDIR}/include -I${.CURDIR}/../../include
  CFLAGS+=3D-I${.CURDIR}/${MACHINE_ARCH}
 +CFLAGS+=3D-I${.CURDIR}/../../libexec/rtld-elf
 +CFLAGS+=3D-I${.CURDIR}/../../libexec/rtld-elf/${MACHINE_ARCH}
  CFLAGS+=3D-DNLS
  CLEANFILES+=3Dtags
  INSTALL_PIC_ARCHIVE=3D
 diff --git a/lib/libc/gen/Makefile.inc b/lib/libc/gen/Makefile.inc
 index 2f562da..fadf339 100644
 --- a/lib/libc/gen/Makefile.inc
 +++ b/lib/libc/gen/Makefile.inc
 @@ -10,7 +10,7 @@ SRCS+=3D  __getosreldate.c __xuname.c \
  	alarm.c arc4random.c assert.c basename.c check_utility_compat.c \
  	clock.c closedir.c confstr.c \
  	crypt.c ctermid.c daemon.c devname.c dirname.c disklabel.c \
 -	dlfcn.c drand48.c erand48.c err.c errlst.c errno.c \
 +	dlfcn.c dllock.c drand48.c erand48.c err.c errlst.c errno.c \
  	exec.c fdevname.c feature_present.c fmtcheck.c fmtmsg.c fnmatch.c \
  	fpclassify.c frexp.c fstab.c ftok.c fts.c fts-compat.c ftw.c \
  	getbootfile.c getbsize.c \
 diff --git a/lib/libc/gen/dllock.c b/lib/libc/gen/dllock.c
 new file mode 100644
 index 0000000..0980147
 --- /dev/null
 +++ b/lib/libc/gen/dllock.c
 @@ -0,0 +1,107 @@
 +/*-
 + * Copyright (c) 2010 Konstantin Belousov
 + * All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP=
 OSE
 + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT=
 IAL
 + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR=
 ICT
 + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W=
 AY
 + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 + * SUCH DAMAGE.
 + */
 +
 +#include <sys/cdefs.h>
 +__FBSDID("$FreeBSD$");
 +
 +#include <sys/types.h>
 +#include <machine/atomic.h>
 +#include <stdlib.h>
 +
 +#include "rtld_lock.h"
 +
 +static void *
 +_dummy_dl_lock_create(void)
 +{
 +
 +	return ((void *)1);
 +}
 +
 +static void
 +_dummy_dl_lock_destroy(void *lock __unused)
 +{
 +}
 +
 +static void
 +_dummy_dl_rlock_acquire(void *lock __unused)
 +{
 +}
 +
 +static void
 +_dummy_dl_wlock_acquire(void *lock __unused)
 +{
 +}
 +
 +static void
 +_dummy_dl_lock_release(void *lock __unused)
 +{
 +}
 +
 +static int _dummy_dl_mask;
 +
 +static int
 +_dummy_dl_set_flag(int mask)
 +{
 +	int old, new;
 +
 +	do {
 +		old =3D _dummy_dl_mask;
 +		new =3D old | mask;
 +	} while (!atomic_cmpset_acq_int(&_dummy_dl_mask, old, new));
 +	return (old);
 +}
 +
 +static int
 +_dummy_dl_clr_flag(int mask __unused)
 +{
 +
 +	int old, new;
 +
 +	do {
 +		old =3D _dummy_dl_mask;
 +		new =3D old & (~mask);
 +	} while (!atomic_cmpset_rel_int(&_dummy_dl_mask, old, new));
 +	return (old);
 +}
 +
 +static void _dllock_init(void) __attribute__((constructor));
 +static void
 +_dllock_init(void)
 +{
 +	struct RtldLockInfo li;
 +
 +	li.lock_create  =3D _dummy_dl_lock_create;
 +	li.lock_destroy =3D _dummy_dl_lock_destroy;
 +	li.rlock_acquire =3D _dummy_dl_rlock_acquire;
 +	li.wlock_acquire =3D _dummy_dl_wlock_acquire;
 +	li.lock_release  =3D _dummy_dl_lock_release;
 +	li.thread_set_flag =3D _dummy_dl_set_flag;
 +	li.thread_clr_flag =3D _dummy_dl_clr_flag;
 +	li.at_fork =3D NULL;
 +
 +	_rtld_thread_init(&li);
 +}
 +
 diff --git a/libexec/rtld-elf/Makefile b/libexec/rtld-elf/Makefile
 index d6df617..d451681 100644
 --- a/libexec/rtld-elf/Makefile
 +++ b/libexec/rtld-elf/Makefile
 @@ -11,6 +11,7 @@ MAN=3D		rtld.1
  CSTD?=3D		gnu99
  CFLAGS+=3D	-Wall -DFREEBSD_ELF -DIN_RTLD
  CFLAGS+=3D	-I${.CURDIR}/${MACHINE_ARCH} -I${.CURDIR}
 +CFLAGS+=3D	-g -DDEBUG
  LDFLAGS+=3D	-nostdlib -e .rtld_start
  WARNS?=3D		2
  INSTALLFLAGS=3D	-C -b
 diff --git a/libexec/rtld-elf/rtld_lock.c b/libexec/rtld-elf/rtld_lock.c
 index c5e582e..5c8be68 100644
 --- a/libexec/rtld-elf/rtld_lock.c
 +++ b/libexec/rtld-elf/rtld_lock.c
 @@ -158,19 +158,30 @@ def_thread_clr_flag(int mask)
  /*
   * Public interface exposed to the rest of the dynamic linker.
   */
 -static struct RtldLockInfo lockinfo;
 +static struct RtldLockInfo pli_stack[8];
 +static int pli_current_idx =3D -1;
 +
 +static struct RtldLockInfo *
 +lockinfo(void)
 +{
 +
 +	if (pli_current_idx =3D=3D -1)
 +		abort();
 +	return (&pli_stack[pli_current_idx]);
 +}
 +
  static struct RtldLockInfo deflockinfo;
 =20
  static __inline int
  thread_mask_set(int mask)
  {
 -	return lockinfo.thread_set_flag(mask);
 +	return lockinfo()->thread_set_flag(mask);
  }
 =20
  static __inline void
  thread_mask_clear(int mask)
  {
 -	lockinfo.thread_clr_flag(mask);
 +	lockinfo()->thread_clr_flag(mask);
  }
 =20
  #define	RTLD_LOCK_CNT	3
 @@ -190,7 +201,7 @@ rlock_acquire(rtld_lock_t lock)
  	    dbg("rlock_acquire: recursed");
  	    return (0);
  	}
 -	lockinfo.rlock_acquire(lock->handle);
 +	lockinfo()->rlock_acquire(lock->handle);
  	return (1);
  }
 =20
 @@ -201,7 +212,7 @@ wlock_acquire(rtld_lock_t lock)
  	    dbg("wlock_acquire: recursed");
  	    return (0);
  	}
 -	lockinfo.wlock_acquire(lock->handle);
 +	lockinfo()->wlock_acquire(lock->handle);
  	return (1);
  }
 =20
 @@ -211,7 +222,7 @@ rlock_release(rtld_lock_t lock, int locked)
  	if (locked =3D=3D 0)
  	    return;
  	thread_mask_clear(lock->mask);
 -	lockinfo.lock_release(lock->handle);
 +	lockinfo()->lock_release(lock->handle);
  }
 =20
  void
 @@ -220,7 +231,7 @@ wlock_release(rtld_lock_t lock, int locked)
  	if (locked =3D=3D 0)
  	    return;
  	thread_mask_clear(lock->mask);
 -	lockinfo.lock_release(lock->handle);
 +	lockinfo()->lock_release(lock->handle);
  }
 =20
  void
 @@ -243,7 +254,6 @@ lockdflt_init()
  	    rtld_locks[i].handle =3D NULL;
      }
 =20
 -    memcpy(&lockinfo, &deflockinfo, sizeof(lockinfo));
      _rtld_thread_init(NULL);
      /*
       * Construct a mask to block all signals except traps which might
 @@ -272,13 +282,33 @@ _rtld_thread_init(struct RtldLockInfo *pli)
  {
  	int flags, i;
  	void *locks[RTLD_LOCK_CNT];
 +	struct RtldLockInfo *prev_pli;
 =20
  	/* disable all locking while this function is running */
 -	flags =3D	thread_mask_set(~0);
 -
 -	if (pli =3D=3D NULL)
 -		pli =3D &deflockinfo;
 -
 +	if (pli =3D=3D NULL && pli_current_idx =3D=3D -1)
 +		flags =3D def_thread_set_flag(~0);
 +	else
 +		flags =3D	thread_mask_set(~0);
 +
 +	if (pli =3D=3D NULL) {
 +		if (pli_current_idx =3D=3D -1) {
 +			pli_current_idx =3D 0;
 +			pli_stack[pli_current_idx] =3D deflockinfo;
 +			pli =3D &pli_stack[pli_current_idx];
 +			prev_pli =3D NULL;
 +		} else {
 +			prev_pli =3D &pli_stack[pli_current_idx];
 +			pli =3D &pli_stack[pli_current_idx--];
 +			if (pli_current_idx =3D=3D -1)
 +				abort();
 +		}
 +	} else {
 +		prev_pli =3D &pli_stack[pli_current_idx];
 +		if (++pli_current_idx >=3D
 +		    sizeof(pli_stack) / sizeof(pli_stack[0]))
 +			abort();
 +		pli_stack[pli_current_idx] =3D *pli;
 +	}
 =20
  	for (i =3D 0; i < RTLD_LOCK_CNT; i++)
  		if ((locks[i] =3D pli->lock_create()) =3D=3D NULL)
 @@ -290,12 +320,14 @@ _rtld_thread_init(struct RtldLockInfo *pli)
  		abort();
  	}
 =20
 -	for (i =3D 0; i < RTLD_LOCK_CNT; i++) {
 -		if (rtld_locks[i].handle =3D=3D NULL)
 -			continue;
 -		if (flags & rtld_locks[i].mask)
 -			lockinfo.lock_release(rtld_locks[i].handle);
 -		lockinfo.lock_destroy(rtld_locks[i].handle);
 +	if (prev_pli !=3D NULL) {
 +		for (i =3D 0; i < RTLD_LOCK_CNT; i++) {
 +			if (rtld_locks[i].handle =3D=3D NULL)
 +				continue;
 +			if (flags & rtld_locks[i].mask)
 +				prev_pli->lock_release(rtld_locks[i].handle);
 +			prev_pli->lock_destroy(rtld_locks[i].handle);
 +		}
  	}
 =20
  	for (i =3D 0; i < RTLD_LOCK_CNT; i++) {
 @@ -304,15 +336,6 @@ _rtld_thread_init(struct RtldLockInfo *pli)
  			pli->wlock_acquire(rtld_locks[i].handle);
  	}
 =20
 -	lockinfo.lock_create =3D pli->lock_create;
 -	lockinfo.lock_destroy =3D pli->lock_destroy;
 -	lockinfo.rlock_acquire =3D pli->rlock_acquire;
 -	lockinfo.wlock_acquire =3D pli->wlock_acquire;
 -	lockinfo.lock_release  =3D pli->lock_release;
 -	lockinfo.thread_set_flag =3D pli->thread_set_flag;
 -	lockinfo.thread_clr_flag =3D pli->thread_clr_flag;
 -	lockinfo.at_fork =3D pli->at_fork;
 -
  	/* restore thread locking state, this time with new locks */
  	thread_mask_clear(~0);
  	thread_mask_set(flags);
 
 --kG2acDqmwoBDcCHP
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkvRkZ0ACgkQC3+MBN1Mb4j5cACeNiBETt3fYfRk9AW5sEndBmjd
 U3AAoO6LQenG9fp1XjBTLJMrSDdC/a2Z
 =8mxC
 -----END PGP SIGNATURE-----
 
 --kG2acDqmwoBDcCHP--

From: John Baldwin <jhb@freebsd.org>
To: Kostik Belousov <kostikbel@gmail.com>
Cc: bug-followup@freebsd.org,
 guillaume@morinfr.org,
 kan@freebsd.org,
 davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 23 Apr 2010 08:43:41 -0400

 On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > I tracked the sigprocmask() system calls down to the operations to
 > > acquire a write lock in the runtime linker. The lock was added to fix
 > > an earlier bug with throwing exceptions in multithreaded C++ apps. The
 > > relevant commit that added the lock is this:
 > >
 > >    http://svn.freebsd.org/viewvc/base?view=revision&revision=178807
 > >
 > > Are exceptions permitted during a signal handler? If not, then in
 > > theory we would not need to invoke sigprocmask() for this particular
 > > lock perhaps? I'm not sure how easy that would be to achieve given the
 > > hooks to allow the thread library to overload the locking routines.
 > > Also, this doesn't explain the lack of sigprocmask() calls under i386.
 > > FreeBSD/i386 should be using the same locking code and thus invoking
 > > sigprocmask() for each exception as well.
 > 
 > Throwing an exception during asyncronous signal execution rises undefined
 > behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 > be removed as far as we need to provide FreeBSD 4.x compatibility.
 
 Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is that 
 just a copy-paste from libc_r that can be removed now?
 
 -- 
 John Baldwin

From: Kostik Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, kan@freebsd.org,
        davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 23 Apr 2010 16:47:40 +0300

 --z118w8IfbP8nVdqq
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 > On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > > I tracked the sigprocmask() system calls down to the operations to
 > > > acquire a write lock in the runtime linker. The lock was added to fix
 > > > an earlier bug with throwing exceptions in multithreaded C++ apps. The
 > > > relevant commit that added the lock is this:
 > > >
 > > >    http://svn.freebsd.org/viewvc/base?view=3Drevision&revision=3D1788=
 07
 > > >
 > > > Are exceptions permitted during a signal handler? If not, then in
 > > > theory we would not need to invoke sigprocmask() for this particular
 > > > lock perhaps? I'm not sure how easy that would be to achieve given the
 > > > hooks to allow the thread library to overload the locking routines.
 > > > Also, this doesn't explain the lack of sigprocmask() calls under i386.
 > > > FreeBSD/i386 should be using the same locking code and thus invoking
 > > > sigprocmask() for each exception as well.
 > >=20
 > > Throwing an exception during asyncronous signal execution rises undefin=
 ed
 > > behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 > > be removed as far as we need to provide FreeBSD 4.x compatibility.
 >=20
 > Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is tha=
 t=20
 > just a copy-paste from libc_r that can be removed now?
 
 Hmmm^2. It seems it is there to prevent recursive entry into rtld from
 signal handler, that may reference yet unresolved symbol, e.g. libc
 syscall wrapper, from PLT. So my patch is wrong.
 
 --z118w8IfbP8nVdqq
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkvRpPwACgkQC3+MBN1Mb4he1gCg18kWbb7UFBC3TGpZ1fe7vJhU
 0lUAn2Rf4j2SWLC+hdPQqJs8Qn25q+0P
 =IpCA
 -----END PGP SIGNATURE-----
 
 --z118w8IfbP8nVdqq--

From: John Baldwin <jhb@freebsd.org>
To: Kostik Belousov <kostikbel@gmail.com>
Cc: bug-followup@freebsd.org,
 guillaume@morinfr.org,
 kan@freebsd.org,
 davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 23 Apr 2010 10:21:41 -0400

 On Friday 23 April 2010 9:47:40 am Kostik Belousov wrote:
 > On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 > > On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > > > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > > > I tracked the sigprocmask() system calls down to the operations to
 > > > > acquire a write lock in the runtime linker. The lock was added to fix
 > > > > an earlier bug with throwing exceptions in multithreaded C++ apps. The
 > > > > relevant commit that added the lock is this:
 > > > >
 > > > >    http://svn.freebsd.org/viewvc/base?view=revision&revision=178807
 > > > >
 > > > > Are exceptions permitted during a signal handler? If not, then in
 > > > > theory we would not need to invoke sigprocmask() for this particular
 > > > > lock perhaps? I'm not sure how easy that would be to achieve given the
 > > > > hooks to allow the thread library to overload the locking routines.
 > > > > Also, this doesn't explain the lack of sigprocmask() calls under i386.
 > > > > FreeBSD/i386 should be using the same locking code and thus invoking
 > > > > sigprocmask() for each exception as well.
 > > > 
 > > > Throwing an exception during asyncronous signal execution rises undefined
 > > > behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 > > > be removed as far as we need to provide FreeBSD 4.x compatibility.
 > > 
 > > Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is that 
 > > just a copy-paste from libc_r that can be removed now?
 > 
 > Hmmm^2. It seems it is there to prevent recursive entry into rtld from
 > signal handler, that may reference yet unresolved symbol, e.g. libc
 > syscall wrapper, from PLT. So my patch is wrong.
 
 Presumably we could use a different type of lock that doesn't use sigprocmask()
 to serialize calls do dl_iterate_phdr()?  I'm not sure if libthr would really
 need to overwrite the behavior of that lock or if a simple
 atomic_cmpset()-based mutex would always be fine.
 
 OTOH, I'm not sure why libthr needs to use non-standard lock hooks at this point
 as they don't seem to be markedly different from the ones rtld uses.
 
 -- 
 John Baldwin

From: Kostik Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, kan@freebsd.org,
        davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 23 Apr 2010 17:41:11 +0300

 --A47bNRIYjYQgpFVi
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Apr 23, 2010 at 10:21:41AM -0400, John Baldwin wrote:
 > On Friday 23 April 2010 9:47:40 am Kostik Belousov wrote:
 > > On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 > > > On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > > > > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > > > > I tracked the sigprocmask() system calls down to the operations to
 > > > > > acquire a write lock in the runtime linker. The lock was added to=
  fix
 > > > > > an earlier bug with throwing exceptions in multithreaded C++ apps=
 . The
 > > > > > relevant commit that added the lock is this:
 > > > > >
 > > > > >    http://svn.freebsd.org/viewvc/base?view=3Drevision&revision=3D=
 178807
 > > > > >
 > > > > > Are exceptions permitted during a signal handler? If not, then in
 > > > > > theory we would not need to invoke sigprocmask() for this particu=
 lar
 > > > > > lock perhaps? I'm not sure how easy that would be to achieve give=
 n the
 > > > > > hooks to allow the thread library to overload the locking routine=
 s.
 > > > > > Also, this doesn't explain the lack of sigprocmask() calls under =
 i386.
 > > > > > FreeBSD/i386 should be using the same locking code and thus invok=
 ing
 > > > > > sigprocmask() for each exception as well.
 > > > >=20
 > > > > Throwing an exception during asyncronous signal execution rises und=
 efined
 > > > > behaviour, AFAIK. sigprocmask() is there to support libc_r, and can=
 not
 > > > > be removed as far as we need to provide FreeBSD 4.x compatibility.
 > > >=20
 > > > Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is=
  that=20
 > > > just a copy-paste from libc_r that can be removed now?
 > >=20
 > > Hmmm^2. It seems it is there to prevent recursive entry into rtld from
 > > signal handler, that may reference yet unresolved symbol, e.g. libc
 > > syscall wrapper, from PLT. So my patch is wrong.
 >=20
 > Presumably we could use a different type of lock that doesn't use
 > sigprocmask() to serialize calls do dl_iterate_phdr()? I'm not sure if
 > libthr would really need to overwrite the behavior of that lock or if
 > a simple atomic_cmpset()-based mutex would always be fine.
 During my porting of libunwind, I was told by libunwind maintainer
 that they have to call dl_iterate_phdr() from signal context to
 unwind, if libunwind is called from signal context.
 
 Apparently, glibc' dl_iterate_phdr() is not signal-safe, while our is.
 >
 > OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 > this point as they don't seem to be markedly different from the ones
 > rtld uses.
 
 libthr locks provide exclusion both for other kernel-executed threads
 and signal handlers, while the rtld-default locks only protect against
 signal handlers and thus libc_r-style threads.
 
 --A47bNRIYjYQgpFVi
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkvRsYYACgkQC3+MBN1Mb4jm5QCg8l0OCcuqNiutS2fpF84GQ7rW
 1TcAoNwW+edk57r3KM/RaOBFybdHivHi
 =JtDQ
 -----END PGP SIGNATURE-----
 
 --A47bNRIYjYQgpFVi--

From: John Baldwin <jhb@freebsd.org>
To: Kostik Belousov <kostikbel@gmail.com>
Cc: bug-followup@freebsd.org,
 guillaume@morinfr.org,
 kan@freebsd.org,
 davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Thu, 8 Jul 2010 11:29:50 -0400

 On Friday, April 23, 2010 10:41:11 am Kostik Belousov wrote:
 > On Fri, Apr 23, 2010 at 10:21:41AM -0400, John Baldwin wrote:
 > > On Friday 23 April 2010 9:47:40 am Kostik Belousov wrote:
 > > > On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 > > > > On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > > > > > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > > > > > I tracked the sigprocmask() system calls down to the operations to
 > > > > > > acquire a write lock in the runtime linker. The lock was added to fix
 > > > > > > an earlier bug with throwing exceptions in multithreaded C++ apps. The
 > > > > > > relevant commit that added the lock is this:
 > > > > > >
 > > > > > >    http://svn.freebsd.org/viewvc/base?view=revision&revision=178807
 > > > > > >
 > > > > > > Are exceptions permitted during a signal handler? If not, then in
 > > > > > > theory we would not need to invoke sigprocmask() for this particular
 > > > > > > lock perhaps? I'm not sure how easy that would be to achieve given the
 > > > > > > hooks to allow the thread library to overload the locking routines.
 > > > > > > Also, this doesn't explain the lack of sigprocmask() calls under i386.
 > > > > > > FreeBSD/i386 should be using the same locking code and thus invoking
 > > > > > > sigprocmask() for each exception as well.
 > > > > > 
 > > > > > Throwing an exception during asyncronous signal execution rises undefined
 > > > > > behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 > > > > > be removed as far as we need to provide FreeBSD 4.x compatibility.
 > > > > 
 > > > > Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is that 
 > > > > just a copy-paste from libc_r that can be removed now?
 > > > 
 > > > Hmmm^2. It seems it is there to prevent recursive entry into rtld from
 > > > signal handler, that may reference yet unresolved symbol, e.g. libc
 > > > syscall wrapper, from PLT. So my patch is wrong.
 > > 
 > > Presumably we could use a different type of lock that doesn't use
 > > sigprocmask() to serialize calls do dl_iterate_phdr()? I'm not sure if
 > > libthr would really need to overwrite the behavior of that lock or if
 > > a simple atomic_cmpset()-based mutex would always be fine.
 > During my porting of libunwind, I was told by libunwind maintainer
 > that they have to call dl_iterate_phdr() from signal context to
 > unwind, if libunwind is called from signal context.
 > 
 > Apparently, glibc' dl_iterate_phdr() is not signal-safe, while our is.
 
 [Revisiting this]
 
 Do we know of any use cases where libunwind would be used from a signal
 handler?  Could we instead simply declare it to be an unsafe API in a signal
 context?  longjmp(3) isn't safe in a signal context and throwing exceptions
 in a signal handler is undefined, so declaring libunwind to similarly be
 unsafe may be fine.
 
 > > OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 > > this point as they don't seem to be markedly different from the ones
 > > rtld uses.
 > 
 > libthr locks provide exclusion both for other kernel-executed threads
 > and signal handlers, while the rtld-default locks only protect against
 > signal handlers and thus libc_r-style threads.
 
 Oh, bah.  The rtld locks do use atomic operations that are thread safe,
 but I missed that the 'oldsigmask' global needs to be per-thread.
 
 -- 
 John Baldwin

From: Kostik Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, kan@freebsd.org,
        davidxu@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 9 Jul 2010 15:56:34 +0300

 --gTY1JhLGodeuSBqf
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Thu, Jul 08, 2010 at 11:29:50AM -0400, John Baldwin wrote:
 > On Friday, April 23, 2010 10:41:11 am Kostik Belousov wrote:
 > > On Fri, Apr 23, 2010 at 10:21:41AM -0400, John Baldwin wrote:
 > > > On Friday 23 April 2010 9:47:40 am Kostik Belousov wrote:
 > > > > On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 > > > > > On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 > > > > > > On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 > > > > > > > I tracked the sigprocmask() system calls down to the operatio=
 ns to
 > > > > > > > acquire a write lock in the runtime linker. The lock was adde=
 d to fix
 > > > > > > > an earlier bug with throwing exceptions in multithreaded C++ =
 apps. The
 > > > > > > > relevant commit that added the lock is this:
 > > > > > > >
 > > > > > > >    http://svn.freebsd.org/viewvc/base?view=3Drevision&revisio=
 n=3D178807
 > > > > > > >
 > > > > > > > Are exceptions permitted during a signal handler? If not, the=
 n in
 > > > > > > > theory we would not need to invoke sigprocmask() for this par=
 ticular
 > > > > > > > lock perhaps? I'm not sure how easy that would be to achieve =
 given the
 > > > > > > > hooks to allow the thread library to overload the locking rou=
 tines.
 > > > > > > > Also, this doesn't explain the lack of sigprocmask() calls un=
 der i386.
 > > > > > > > FreeBSD/i386 should be using the same locking code and thus i=
 nvoking
 > > > > > > > sigprocmask() for each exception as well.
 > > > > > >=20
 > > > > > > Throwing an exception during asyncronous signal execution rises=
  undefined
 > > > > > > behaviour, AFAIK. sigprocmask() is there to support libc_r, and=
  cannot
 > > > > > > be removed as far as we need to provide FreeBSD 4.x compatibili=
 ty.
 > > > > >=20
 > > > > > Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?=
   Is that=20
 > > > > > just a copy-paste from libc_r that can be removed now?
 > > > >=20
 > > > > Hmmm^2. It seems it is there to prevent recursive entry into rtld f=
 rom
 > > > > signal handler, that may reference yet unresolved symbol, e.g. libc
 > > > > syscall wrapper, from PLT. So my patch is wrong.
 > > >=20
 > > > Presumably we could use a different type of lock that doesn't use
 > > > sigprocmask() to serialize calls do dl_iterate_phdr()? I'm not sure if
 > > > libthr would really need to overwrite the behavior of that lock or if
 > > > a simple atomic_cmpset()-based mutex would always be fine.
 > > During my porting of libunwind, I was told by libunwind maintainer
 > > that they have to call dl_iterate_phdr() from signal context to
 > > unwind, if libunwind is called from signal context.
 > >=20
 > > Apparently, glibc' dl_iterate_phdr() is not signal-safe, while our is.
 >=20
 > [Revisiting this]
 >=20
 > Do we know of any use cases where libunwind would be used from a signal
 > handler?  Could we instead simply declare it to be an unsafe API in a sig=
 nal
 > context?  longjmp(3) isn't safe in a signal context and throwing exceptio=
 ns
 > in a signal handler is undefined, so declaring libunwind to similarly be
 > unsafe may be fine.
 Yes, one of the typical use of libunwind is profiling, where backtrace
 is done exactly in signal handler. On the other hand, linux is unsafe
 there.
 
 Might be, we should provide some environment variable, that flips the
 safety ?
 >=20
 > > > OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 > > > this point as they don't seem to be markedly different from the ones
 > > > rtld uses.
 > >=20
 > > libthr locks provide exclusion both for other kernel-executed threads
 > > and signal handlers, while the rtld-default locks only protect against
 > > signal handlers and thus libc_r-style threads.
 >=20
 > Oh, bah.  The rtld locks do use atomic operations that are thread safe,
 > but I missed that the 'oldsigmask' global needs to be per-thread.
 >=20
 > --=20
 > John Baldwin
 
 --gTY1JhLGodeuSBqf
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkw3HIIACgkQC3+MBN1Mb4gVJwCdEqUv0VT/dDcIM3NsdieKPWr0
 i6YAoIzUx30immPeo6g/v65nVFaeP3N3
 =YQ3J
 -----END PGP SIGNATURE-----
 
 --gTY1JhLGodeuSBqf--

From: David Xu <davidxu@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Cc: Kostik Belousov <kostikbel@gmail.com>, bug-followup@freebsd.org,
        guillaume@morinfr.org, kan@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 27 Aug 2010 16:20:03 +0000

 John Baldwin wrote:
 > On Friday, April 23, 2010 10:41:11 am Kostik Belousov wrote:
 >> On Fri, Apr 23, 2010 at 10:21:41AM -0400, John Baldwin wrote:
 >>> On Friday 23 April 2010 9:47:40 am Kostik Belousov wrote:
 >>>> On Fri, Apr 23, 2010 at 08:43:41AM -0400, John Baldwin wrote:
 >>>>> On Friday 23 April 2010 8:25:01 am Kostik Belousov wrote:
 >>>>>> On Thu, Apr 22, 2010 at 04:09:34PM -0400, John Baldwin wrote:
 >>>>>>> I tracked the sigprocmask() system calls down to the operations to
 >>>>>>> acquire a write lock in the runtime linker. The lock was added to fix
 >>>>>>> an earlier bug with throwing exceptions in multithreaded C++ apps. The
 >>>>>>> relevant commit that added the lock is this:
 >>>>>>>
 >>>>>>>    http://svn.freebsd.org/viewvc/base?view=revision&revision=178807
 >>>>>>>
 >>>>>>> Are exceptions permitted during a signal handler? If not, then in
 >>>>>>> theory we would not need to invoke sigprocmask() for this particular
 >>>>>>> lock perhaps? I'm not sure how easy that would be to achieve given the
 >>>>>>> hooks to allow the thread library to overload the locking routines.
 >>>>>>> Also, this doesn't explain the lack of sigprocmask() calls under i386.
 >>>>>>> FreeBSD/i386 should be using the same locking code and thus invoking
 >>>>>>> sigprocmask() for each exception as well.
 >>>>>> Throwing an exception during asyncronous signal execution rises undefined
 >>>>>> behaviour, AFAIK. sigprocmask() is there to support libc_r, and cannot
 >>>>>> be removed as far as we need to provide FreeBSD 4.x compatibility.
 >>>>> Hmmm.  Why does libthr use sigprocmask() for its rtld locks then?  Is that 
 >>>>> just a copy-paste from libc_r that can be removed now?
 >>>> Hmmm^2. It seems it is there to prevent recursive entry into rtld from
 >>>> signal handler, that may reference yet unresolved symbol, e.g. libc
 >>>> syscall wrapper, from PLT. So my patch is wrong.
 >>> Presumably we could use a different type of lock that doesn't use
 >>> sigprocmask() to serialize calls do dl_iterate_phdr()? I'm not sure if
 >>> libthr would really need to overwrite the behavior of that lock or if
 >>> a simple atomic_cmpset()-based mutex would always be fine.
 >> During my porting of libunwind, I was told by libunwind maintainer
 >> that they have to call dl_iterate_phdr() from signal context to
 >> unwind, if libunwind is called from signal context.
 >>
 >> Apparently, glibc' dl_iterate_phdr() is not signal-safe, while our is.
 > 
 > [Revisiting this]
 > 
 > Do we know of any use cases where libunwind would be used from a signal
 > handler?  Could we instead simply declare it to be an unsafe API in a signal
 > context?  longjmp(3) isn't safe in a signal context and throwing exceptions
 > in a signal handler is undefined, so declaring libunwind to similarly be
 > unsafe may be fine.
 > 
 >>> OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 >>> this point as they don't seem to be markedly different from the ones
 >>> rtld uses.
 >> libthr locks provide exclusion both for other kernel-executed threads
 >> and signal handlers, while the rtld-default locks only protect against
 >> signal handlers and thus libc_r-style threads.
 > 
 > Oh, bah.  The rtld locks do use atomic operations that are thread safe,
 > but I missed that the 'oldsigmask' global needs to be per-thread.
 > 
 
 Current I am testing on a signal wrapper patch for libthr,
 as a side effect, the patch eliminates the need of sigprocmask for
 rtld lock.
 
 time costed by the example on my machine is:
  > time ./testexcept
 
 0.437u 0.000s 0:00.43 100.0%	5+5120k 0+0io 0pf+0w
 
 The problem still exists if the program does not create a second thread,
 because I have trouble to enable libthr's rtld lock in
 _libpthread_init() which has __attribute__ ((constructor)), this
 means rtld is in critical region, and I can not use run _thr_rtld_init() 
 to set rtld locks at that time, chicken-egg problem.
 
 The patch is mainly used for fixing thread cancellation race which
 is caused by signal, yes, signal is always a kind of pain for thread
 library.
 
 http://people.freebsd.org/~davidxu/patch/signal_wrapper.patch
 

From: David Xu <davidxu@freebsd.org>
To: bug-followup@FreeBSD.org, guillaume@morinfr.org
Cc: Kostik Belousov <kostikbel@gmail.com>, John Baldwin <jhb@freebsd.org>
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Sun, 29 Aug 2010 14:55:32 +0800

 Without the previous signal wrapper patch I posted (I am not sure
 I will use it, because it is too complex),  I think there is another way
 to avoid sigprocmask,  I have ever written a system call
 
 sc_shared_t	*schedctl(void);
 
 
 which returns shared data area between userland and kernel.
 userland code sets a flag in the data area to disable signal delivering.
 when kernel code wants to deliver signal, it also checks the flag,  and
 does not deliver signals if the flag is set, then the problem would be 
 fixed:
 http://people.freebsd.org/~davidxu/schedctl/
 
 

From: Kostik Belousov <kostikbel@gmail.com>
To: David Xu <davidxu@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org,
        John Baldwin <jhb@freebsd.org>
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Sun, 29 Aug 2010 14:57:56 +0300

 --siH5g6X9/kbebbtM
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Sun, Aug 29, 2010 at 02:55:32PM +0800, David Xu wrote:
 > Without the previous signal wrapper patch I posted (I am not sure
 > I will use it, because it is too complex),  I think there is another way
 > to avoid sigprocmask,  I have ever written a system call
 >=20
 > sc_shared_t	*schedctl(void);
 >=20
 >=20
 > which returns shared data area between userland and kernel.
 > userland code sets a flag in the data area to disable signal delivering.
 > when kernel code wants to deliver signal, it also checks the flag,  and
 > does not deliver signals if the flag is set, then the problem would be=20
 > fixed:
 > http://people.freebsd.org/~davidxu/schedctl/
 >=20
 I only skimmed over the (incomplete) change. It seems it has issues
 with rfork(). In particular, when shared vm space between two processes
 becomes forked.
 
 Also, it is not clear to me what would happen if the shared page paged
 out or user mode explicitely unmap(2) the shared region. At least the kernel
 mapping should be invalidated, otherwise kernel might modify random memory.
 
 I do not like the idea of using additional non-observable state bits,
 in addition to the signal mask, to block the signal delivery. IMHO, it
 subverts the signal mechanism, and, in case of memory corruption, makes
 debugging too hard.
 
 --siH5g6X9/kbebbtM
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkx6S0QACgkQC3+MBN1Mb4iR0ACgvk00JTKvHNa01CmoJCJCeT+D
 8F8An0rCa1Z3ZbLlLgi1UC8ICZCxczv0
 =LWTr
 -----END PGP SIGNATURE-----
 
 --siH5g6X9/kbebbtM--

From: David Xu <davidxu@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Cc: Kostik Belousov <kostikbel@gmail.com>, bug-followup@freebsd.org,
        guillaume@morinfr.org, kan@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Tue, 14 Sep 2010 14:00:13 +0000

 John Baldwin wrote:
 
 > Do we know of any use cases where libunwind would be used from a signal
 > handler?  Could we instead simply declare it to be an unsafe API in a signal
 > context?  longjmp(3) isn't safe in a signal context and throwing exceptions
 > in a signal handler is undefined, so declaring libunwind to similarly be
 > unsafe may be fine.
 > 
 
 It is true that libunwind would be used from a signal handler.
 In fact, when I was working on stack unwinding support for libthr, I
 found it.
 
 The reason I was trying to do it is that I want to let C++'s on-stack
 object to be destructed when thread is exited, otherwise, C++ program
 can not use pthread cancellation feature, the pthread cancellation
 calls pthread_exit(), and the function should unwind the thread's stack
 for C++ like language, otherwise the programs leak resource.
 
 In head branch, things are improved, for defer-mod, thread cancellation
 is called from in-place context, but for asynchronous mode, thread
 cancellation is called from a signal handler, the SIGCANCEL hanlder, so
 the libunwind needs to dig out the saved context and unwind the
 interrupted stack.
 
 A very bad news is libunwind only did unwind-through-signal-stack for
 linux, nothing has been done for FreeBSD and others, code has been
 found here:
 /usr/src/contrib/gcc/config/i386/linux-unwind.h
 
 I even have a patch for FreeBSD x86 to support the
 unwind-through-signal-stack, but I have not fully tested it.
 http://people.freebsd.org/~davidxu/patch/unwind.patch
 You can say this is a crazy idea, but they did it.
 
 >>> OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 >>> this point as they don't seem to be markedly different from the ones
 >>> rtld uses.
 >> libthr locks provide exclusion both for other kernel-executed threads
 >> and signal handlers, while the rtld-default locks only protect against
 >> signal handlers and thus libc_r-style threads.
 > 
 > Oh, bah.  The rtld locks do use atomic operations that are thread safe,
 > but I missed that the 'oldsigmask' global needs to be per-thread.
 > 
 
 In head branch, when program is linked with libthr, and created a
 thread, the libthr's rtld lock implementation is activated,
 performance should be improved, but otherwise, it is still slow for
 non-threaded C++ program.

From: Tijl Coosemans <tijl@coosemans.org>
To: jhb@freebsd.org
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, kan@freebsd.org,
        davidxu@freebsd.org, kostikbel@gmail.com
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Tue, 14 Sep 2010 11:35:48 +0200

 On Thu, Jul 08, 2010 at 11:29:50AM -0400, John Baldwin wrote:
 > ...longjmp(3) isn't safe in a signal context...
 
 POSIX says it's supposed to be safe:
 
   "As it bypasses the usual function call and return mechanisms,
   longjmp() shall execute correctly in contexts of interrupts, signals,
   and any of their associated functions. However, if longjmp() is
   invoked from a nested signal handler (that is, from a function
   invoked as a result of a signal raised during the handling of another
   signal), the behavior is undefined."

From: Kostik Belousov <kostikbel@gmail.com>
To: David Xu <davidxu@freebsd.org>
Cc: John Baldwin <jhb@freebsd.org>, bug-followup@freebsd.org,
        guillaume@morinfr.org, kan@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Tue, 14 Sep 2010 14:12:12 +0300

 --S7pq8suDAU0LBjBQ
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Sep 14, 2010 at 02:00:13PM +0000, David Xu wrote:
 > John Baldwin wrote:
 >=20
 > >Do we know of any use cases where libunwind would be used from a signal
 > >handler?  Could we instead simply declare it to be an unsafe API in a=20
 > >signal
 > >context?  longjmp(3) isn't safe in a signal context and throwing excepti=
 ons
 > >in a signal handler is undefined, so declaring libunwind to similarly be
 > >unsafe may be fine.
 > >
 >=20
 > It is true that libunwind would be used from a signal handler.
 > In fact, when I was working on stack unwinding support for libthr, I
 > found it.
 >=20
 > The reason I was trying to do it is that I want to let C++'s on-stack
 > object to be destructed when thread is exited, otherwise, C++ program
 > can not use pthread cancellation feature, the pthread cancellation
 > calls pthread_exit(), and the function should unwind the thread's stack
 > for C++ like language, otherwise the programs leak resource.
 >=20
 > In head branch, things are improved, for defer-mod, thread cancellation
 > is called from in-place context, but for asynchronous mode, thread
 > cancellation is called from a signal handler, the SIGCANCEL hanlder, so
 > the libunwind needs to dig out the saved context and unwind the
 > interrupted stack.
 >=20
 > A very bad news is libunwind only did unwind-through-signal-stack for
 > linux, nothing has been done for FreeBSD and others, code has been
 > found here:
 > /usr/src/contrib/gcc/config/i386/linux-unwind.h
 Err ? When I ported libunwind, I spent a lot of time making unwind
 through the signal frame working. The part of the trouble was that
 our signal trampoline lacks unwind info. And annotating the trampoline
 is not a whole solution, since libunwind can only find the FDEs by
 =2Eeh_frame_hdr of some dso. This would require creating fake dso for
 trampolines.
 
 I decided to use old-linuxish method of unwinding by hardcoding frame
 format and trampoline code sequence for detection.
 
 >=20
 > I even have a patch for FreeBSD x86 to support the
 > unwind-through-signal-stack, but I have not fully tested it.
 > http://people.freebsd.org/~davidxu/patch/unwind.patch
 > You can say this is a crazy idea, but they did it.
 >=20
 > >>>OTOH, I'm not sure why libthr needs to use non-standard lock hooks at
 > >>>this point as they don't seem to be markedly different from the ones
 > >>>rtld uses.
 > >>libthr locks provide exclusion both for other kernel-executed threads
 > >>and signal handlers, while the rtld-default locks only protect against
 > >>signal handlers and thus libc_r-style threads.
 > >
 > >Oh, bah.  The rtld locks do use atomic operations that are thread safe,
 > >but I missed that the 'oldsigmask' global needs to be per-thread.
 > >
 >=20
 > In head branch, when program is linked with libthr, and created a
 > thread, the libthr's rtld lock implementation is activated,
 > performance should be improved, but otherwise, it is still slow for
 > non-threaded C++ program.
 
 BTW, the signal handler interposition that you implemented for libthr
 probably belongs to libc. I already implemented dso filtering for
 our rtld, so I hope to start discussion about merging libc and libthr
 into single library. Then libc could interpose signal handlers.
 
 --S7pq8suDAU0LBjBQ
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (FreeBSD)
 
 iEYEARECAAYFAkyPWIsACgkQC3+MBN1Mb4jN7wCg0DrtBTWEhOzMQNr9+nTYnYFu
 l7kAoIBzrW8xguotia3aSj45Hr/2G4Kb
 =r1Ml
 -----END PGP SIGNATURE-----
 
 --S7pq8suDAU0LBjBQ--

From: John Baldwin <jhb@freebsd.org>
To: bug-followup@freebsd.org,
 guillaume@morinfr.org
Cc: kib@freebsd.org,
 theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 28 Jun 2013 08:47:55 -0400

 Looking at this again, the patch committed in 178807 is just wrong and should 
 be reverted.  There is no state in rtld that needs to be protected via a write 
 lock.  GCC is too lazy to use their own locking to protect shared state 
 between threads and wants the runtime linker to enforce this.  Their 
 justification that glibc doesn't allow concurrent execution of this isn't a 
 valid excuse.  For an API like this that just walks a list and invokes a 
 callback, if the callback manipulates shared state owned by the caller, the 
 caller should be responsible for sychronizing access to it, not rtld!
 
 Instead I think we should apply the patch in the original GCC bug to our in-
 tree GCC and to our GCC ports.  This should remove the sigprocmask calls and
 not penalize other users of dl_iterate_phdr() for GCC's poor behavior.
 
 -- 
 John Baldwin

From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Fri, 28 Jun 2013 20:17:21 +0300

 --hhtCH1hro9pJkPIH
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Jun 28, 2013 at 08:47:55AM -0400, John Baldwin wrote:
 > Looking at this again, the patch committed in 178807 is just wrong and sh=
 ould=20
 > be reverted.  There is no state in rtld that needs to be protected via a =
 write=20
 > lock.  GCC is too lazy to use their own locking to protect shared state=
 =20
 > between threads and wants the runtime linker to enforce this.  Their=20
 > justification that glibc doesn't allow concurrent execution of this isn't=
  a=20
 > valid excuse.  For an API like this that just walks a list and invokes a=
 =20
 > callback, if the callback manipulates shared state owned by the caller, t=
 he=20
 > caller should be responsible for sychronizing access to it, not rtld!
 >=20
 > Instead I think we should apply the patch in the original GCC bug to our =
 in-
 > tree GCC and to our GCC ports.  This should remove the sigprocmask calls =
 and
 > not penalize other users of dl_iterate_phdr() for GCC's poor behavior.
 
 In other words, we should become knowingly incompatible with the stock
 GCC and with other consumers of dl_iterate_phdr(), like libunwind ?
 E.g. libunwind ability to unwind from the signal handler relies on
 this behaviour.
 
 --hhtCH1hro9pJkPIH
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.20 (FreeBSD)
 
 iQIcBAEBAgAGBQJRzcUgAAoJEJDCuSvBvK1BBuwQAIWoFrFkw9Op7RybuuAw9K5h
 f49+JDpcscYrQcF8KSYWiUdV3ZcoBnubqKpT+6vWm5MQ4PwUCEv3ouKAhD/oe4n+
 oJg3pQz+mFicI7s0Wr3rJFyf0vO5wcHnxDOq7XdjHIPlGqTZcnOjubdCJ42xpZkT
 7004JA8B8WHitjh+9qJGQWOUDfBzUWDE3WwieqpYnKricnFhJh8/6gTg6abbZGkE
 6szxxKdPMUHJ28X54HU1DV9A2TJgfjLsGIzneQtfpOp7TTIyTKfn2hFHp5eLhWB/
 voH0HAegdg7ew3MaCl2fnGKb6UR0h3yShowp3KfH1LozmZNDw6C/6VSKwy55aSYY
 GaVXlWEhniim7NaRgP2okdMPEz07pUt3KoIN5mQGrlvgusTUa7YXcrwCm97l72dT
 EqjgffvFUPmmHP38jhgf/wkI1aQ6tY7eSqSLDM+MMBX6TnPKx4EAr3H/tc79idEx
 O89zUHFJPuI7YY563O+dR0Bm09kIDPVNb3hTG09JF2KCxY3QYlje8Iu5ndKOLCi+
 HvFwnTLpDEFEd22oNWdSeNUq97Rr2mAMSv5dk9A+a8mtsRbzPSeuylIKSEEKquDu
 USJ2ZyISoTnZbb6Iz6SYkZRn8vOBRUfBbpPRcAJh2FkBTegl7JE7dS3tM7C26uzf
 MxGTc8YbAXTWA1XFxyZR
 =hwdp
 -----END PGP SIGNATURE-----
 
 --hhtCH1hro9pJkPIH--

From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: bug-followup@freebsd.org,
 guillaume@morinfr.org,
 theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD 7.1/amd64
Date: Fri, 28 Jun 2013 13:38:39 -0400

 On Friday, June 28, 2013 1:17:21 pm Konstantin Belousov wrote:
 > On Fri, Jun 28, 2013 at 08:47:55AM -0400, John Baldwin wrote:
 > > Looking at this again, the patch committed in 178807 is just wrong and should 
 > > be reverted.  There is no state in rtld that needs to be protected via a write 
 > > lock.  GCC is too lazy to use their own locking to protect shared state 
 > > between threads and wants the runtime linker to enforce this.  Their 
 > > justification that glibc doesn't allow concurrent execution of this isn't a 
 > > valid excuse.  For an API like this that just walks a list and invokes a 
 > > callback, if the callback manipulates shared state owned by the caller, the 
 > > caller should be responsible for sychronizing access to it, not rtld!
 > > 
 > > Instead I think we should apply the patch in the original GCC bug to our in-
 > > tree GCC and to our GCC ports.  This should remove the sigprocmask calls and
 > > not penalize other users of dl_iterate_phdr() for GCC's poor behavior.
 > 
 > In other words, we should become knowingly incompatible with the stock
 > GCC and with other consumers of dl_iterate_phdr(), like libunwind ?
 > E.g. libunwind ability to unwind from the signal handler relies on
 > this behaviour.
 
 Does libunwind depend on rtld single-threading to lock state shared with
 other threads?  If it does it should manage that itself.  If GCC and/or
 libunwind want to share arbitrary state between threads, or protect state
 from being accessed by a signal handler, then GCC and/or libunwind should
 manage that.  rtld can't possibly (and shouldn't) know the rules about
 how that state that is private to GCC/libunwind is managed.
 
 What if you had a consumer of this that wanted to access the state outside
 of the callback?  Then it already has to manage all the locking itself to
 be safe anyway.
 
 Put another way, requiring dl_iterate_phdr() to providing locking for consumers
 would be equivalent to code assuming that C++'s for_each() template in
 <algorithm> provided locking to callers.  That is entirely upside-down.
 
 -- 
 John Baldwin

From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: bug-followup@freebsd.org, guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Fri, 28 Jun 2013 20:45:38 +0300

 --8C4Pdt+UNHoLxtm5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Jun 28, 2013 at 01:38:39PM -0400, John Baldwin wrote:
 > On Friday, June 28, 2013 1:17:21 pm Konstantin Belousov wrote:
 > > On Fri, Jun 28, 2013 at 08:47:55AM -0400, John Baldwin wrote:
 > > > Looking at this again, the patch committed in 178807 is just wrong an=
 d should=20
 > > > be reverted.  There is no state in rtld that needs to be protected vi=
 a a write=20
 > > > lock.  GCC is too lazy to use their own locking to protect shared sta=
 te=20
 > > > between threads and wants the runtime linker to enforce this.  Their=
 =20
 > > > justification that glibc doesn't allow concurrent execution of this i=
 sn't a=20
 > > > valid excuse.  For an API like this that just walks a list and invoke=
 s a=20
 > > > callback, if the callback manipulates shared state owned by the calle=
 r, the=20
 > > > caller should be responsible for sychronizing access to it, not rtld!
 > > >=20
 > > > Instead I think we should apply the patch in the original GCC bug to =
 our in-
 > > > tree GCC and to our GCC ports.  This should remove the sigprocmask ca=
 lls and
 > > > not penalize other users of dl_iterate_phdr() for GCC's poor behavior.
 > >=20
 > > In other words, we should become knowingly incompatible with the stock
 > > GCC and with other consumers of dl_iterate_phdr(), like libunwind ?
 > > E.g. libunwind ability to unwind from the signal handler relies on
 > > this behaviour.
 >=20
 > Does libunwind depend on rtld single-threading to lock state shared with
 > other threads?  If it does it should manage that itself.  If GCC and/or
 > libunwind want to share arbitrary state between threads, or protect state
 > from being accessed by a signal handler, then GCC and/or libunwind should
 > manage that.  rtld can't possibly (and shouldn't) know the rules about
 > how that state that is private to GCC/libunwind is managed.
 libunwind depends on the dl_iterate_phdr() signal-safety.
 
 >=20
 > What if you had a consumer of this that wanted to access the state outside
 > of the callback?  Then it already has to manage all the locking itself to
 > be safe anyway.
 >=20
 > Put another way, requiring dl_iterate_phdr() to providing locking for con=
 sumers
 > would be equivalent to code assuming that C++'s for_each() template in
 > <algorithm> provided locking to callers.  That is entirely upside-down.
 
 I think I should revive the fast sigprocmask patch.
 
 --8C4Pdt+UNHoLxtm5
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.20 (FreeBSD)
 
 iQIcBAEBAgAGBQJRzcvBAAoJEJDCuSvBvK1B9UgP/jne8523ys+JZDGUn9izYhJk
 LlT5LSSARhTcbREEsIevB5VQt2pbDtKsca/fCtLmaasPN5cb8XgLeyy1J3caB742
 LdrXLcczv8PvSJM0D7EDTH6ktGrmAl0i9cHb8sLqHuPAD7/ZNBJRfJXWdRe4Rs3r
 /A78kqAdyGlJAupQ+2ocbtsIOg8F1G3L3b7iZ+gA7+txErCv5th6H3+lXalpF+8X
 40FDUvpoU9roOesa9vKcQLFWcBMedLkVmTujmHrvFfMuz6zXu+0Anje5Zc0LPOSu
 hst4vYimFxn/VXQuU5qmGKhhz0o0jtwJzdF836aJotx2tsQWuBLWZiKSBffKQFeB
 D6aK0GlMlK/i6LQD+LeJbjB+k0/jxgK6ZtdetUPnPCUjeHE5IKDrm1Z0yMzMC/20
 H5AQ3WdFR7Jvu11ZK6jp2aqX6BDUTTwW85c1Y/k2n5+I48vD1EDXRDO73aQkHyM8
 0EU5kCPS1CRbXsf4XeGlye/YhJBNS7Hp/tdNSjhSjHckA78xLUsoKeo6LfCmFtG6
 GT8oDSmyugQl6QwCNzNp9bjKJ3wSI1TZjBr8GNQI/kXZJpaJkFb6mzLLLhUbjtt/
 XHrA2gaJDE9eTlOohoBO8zJ8bXe1ykK/YuduXdsAfpqKH4KckkEqUiA1ptrMV7C6
 9olJnLJJMrgbMjv955u6
 =/5H4
 -----END PGP SIGNATURE-----
 
 --8C4Pdt+UNHoLxtm5--

From: Jilles Tjoelker <jilles@stack.nl>
To: bug-followup@FreeBSD.org, Konstantin Belousov <kostikbel@gmail.com>,
	John Baldwin <jhb@freebsd.org>
Cc: guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Sat, 29 Jun 2013 23:53:35 +0200

 On Fri, 28 Jun 2013 20:45:38 +0300, Konstantin Belousov wrote:
 > On Fri, Jun 28, 2013 at 01:38:39PM -0400, John Baldwin wrote:
 > > On Friday, June 28, 2013 1:17:21 pm Konstantin Belousov wrote:
 > > > On Fri, Jun 28, 2013 at 08:47:55AM -0400, John Baldwin wrote:
 > > > > Looking at this again, the patch committed in 178807 is just
 > > > > wrong and should be reverted.  There is no state in rtld that
 > > > > needs to be protected via a write lock.  GCC is too lazy to use
 > > > > their own locking to protect shared state between threads and
 > > > > wants the runtime linker to enforce this.  Their justification
 > > > > that glibc doesn't allow concurrent execution of this isn't a
 > > > > valid excuse.  For an API like this that just walks a list and
 > > > > invokes a callback, if the callback manipulates shared state
 > > > > owned by the caller, the caller should be responsible for
 > > > > sychronizing access to it, not rtld!
 
 > > > > Instead I think we should apply the patch in the original GCC
 > > > > bug to our in- tree GCC and to our GCC ports.  This should
 > > > > remove the sigprocmask calls and not penalize other users of
 > > > > dl_iterate_phdr() for GCC's poor behavior.
 
 > > > In other words, we should become knowingly incompatible with the stock
 > > > GCC and with other consumers of dl_iterate_phdr(), like libunwind ?
 > > > E.g. libunwind ability to unwind from the signal handler relies on
 > > > this behaviour.
 
 > > Does libunwind depend on rtld single-threading to lock state shared
 > > with other threads?  If it does it should manage that itself.  If
 > > GCC and/or libunwind want to share arbitrary state between threads,
 > > or protect state from being accessed by a signal handler, then GCC
 > > and/or libunwind should manage that.  rtld can't possibly (and
 > > shouldn't) know the rules about how that state that is private to
 > > GCC/libunwind is managed.
 
 > libunwind depends on the dl_iterate_phdr() signal-safety.
 
 > > What if you had a consumer of this that wanted to access the state
 > > outside of the callback?  Then it already has to manage all the
 > > locking itself to be safe anyway.
 
 > > Put another way, requiring dl_iterate_phdr() to providing locking
 > > for consumers would be equivalent to code assuming that C++'s
 > > for_each() template in <algorithm> provided locking to callers.
 > > That is entirely upside-down.
 
 > I think I should revive the fast sigprocmask patch.
 
 We could add a version of dl_iterate_phdr that does not call
 sigprocmask() and use it in patched GCC/libunwind/etc. The patched GCC
 and libunwind can then avoid the sigprocmask call (possibly at the cost
 of less efficient caching) while unpatched GCC and libunwind continues
 to work.
 
 I am a bit concerned, though, that this is only needed for the
 unthreaded programming environment. libthr has an efficient method for
 postponing signals that avoids system calls. Moving that mechanism to
 libc, although it is a bit hard, may be an option.
 
 -- 
 Jilles Tjoelker

From: Konstantin Belousov <kostikbel@gmail.com>
To: Jilles Tjoelker <jilles@stack.nl>
Cc: bug-followup@FreeBSD.org, John Baldwin <jhb@freebsd.org>,
        guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Sun, 30 Jun 2013 06:12:32 +0300

 --suMD1gxl821dzEEX
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Sat, Jun 29, 2013 at 11:53:35PM +0200, Jilles Tjoelker wrote:
 > On Fri, 28 Jun 2013 20:45:38 +0300, Konstantin Belousov wrote:
 > > On Fri, Jun 28, 2013 at 01:38:39PM -0400, John Baldwin wrote:
 > > > On Friday, June 28, 2013 1:17:21 pm Konstantin Belousov wrote:
 > > > > On Fri, Jun 28, 2013 at 08:47:55AM -0400, John Baldwin wrote:
 > > > > > Looking at this again, the patch committed in 178807 is just
 > > > > > wrong and should be reverted.  There is no state in rtld that
 > > > > > needs to be protected via a write lock.  GCC is too lazy to use
 > > > > > their own locking to protect shared state between threads and
 > > > > > wants the runtime linker to enforce this.  Their justification
 > > > > > that glibc doesn't allow concurrent execution of this isn't a
 > > > > > valid excuse.  For an API like this that just walks a list and
 > > > > > invokes a callback, if the callback manipulates shared state
 > > > > > owned by the caller, the caller should be responsible for
 > > > > > sychronizing access to it, not rtld!
 >=20
 > > > > > Instead I think we should apply the patch in the original GCC
 > > > > > bug to our in- tree GCC and to our GCC ports.  This should
 > > > > > remove the sigprocmask calls and not penalize other users of
 > > > > > dl_iterate_phdr() for GCC's poor behavior.
 >=20
 > > > > In other words, we should become knowingly incompatible with the st=
 ock
 > > > > GCC and with other consumers of dl_iterate_phdr(), like libunwind ?
 > > > > E.g. libunwind ability to unwind from the signal handler relies on
 > > > > this behaviour.
 >=20
 > > > Does libunwind depend on rtld single-threading to lock state shared
 > > > with other threads?  If it does it should manage that itself.  If
 > > > GCC and/or libunwind want to share arbitrary state between threads,
 > > > or protect state from being accessed by a signal handler, then GCC
 > > > and/or libunwind should manage that.  rtld can't possibly (and
 > > > shouldn't) know the rules about how that state that is private to
 > > > GCC/libunwind is managed.
 >=20
 > > libunwind depends on the dl_iterate_phdr() signal-safety.
 >=20
 > > > What if you had a consumer of this that wanted to access the state
 > > > outside of the callback?  Then it already has to manage all the
 > > > locking itself to be safe anyway.
 >=20
 > > > Put another way, requiring dl_iterate_phdr() to providing locking
 > > > for consumers would be equivalent to code assuming that C++'s
 > > > for_each() template in <algorithm> provided locking to callers.
 > > > That is entirely upside-down.
 >=20
 > > I think I should revive the fast sigprocmask patch.
 >=20
 > We could add a version of dl_iterate_phdr that does not call
 > sigprocmask() and use it in patched GCC/libunwind/etc. The patched GCC
 > and libunwind can then avoid the sigprocmask call (possibly at the cost
 > of less efficient caching) while unpatched GCC and libunwind continues
 > to work.
 As I said, libunwind relies on the signal blocking behaviour to be able
 to unwind from the signal handler.
 
 >=20
 > I am a bit concerned, though, that this is only needed for the
 > unthreaded programming environment. libthr has an efficient method for
 > postponing signals that avoids system calls. Moving that mechanism to
 > libc, although it is a bit hard, may be an option.
 
 Well, the right answer then is, in fact, to merge libc and libthr.
 I implemented ELF filters as the first required step, but did not
 progressed the task further.
 
 IMO the merge is mostly mechanical, the complication is due to the fact
 that the work should be done in branch and takes a lot of time. As
 result, the libc and libthr changes during development are conflicting
 and have to be constantly resolved.
 
 --suMD1gxl821dzEEX
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.20 (FreeBSD)
 
 iQIcBAEBAgAGBQJRz6IfAAoJEJDCuSvBvK1BMyMP/Agnvo15YjGSs3Vj3Dqy/ffE
 o1Pq3xxRSry2E2IUI04OlHR4223LMeUx/oSg5uNoZog3n0V7hWB2jYI3Kyo2n6cH
 0AP6kQTezqnrq8jDsEKeaUZIPqqZ7O1H86pnBNamMmUHfQ7+MzbCpt+D9EXPGmsR
 It5xn3SNFCvzqi8brsJXEir4XMQjFedj07BDjPlQsXsqqLkpDegHsJ7VCBpVStfn
 I4IAWwzJ8oiGrnEsRFKaYq1HvGvQp2S51WnzUt2Uv6PCpz0pAfRoIeYV8az879OQ
 33xU8eMceds8FnOtPsiIrNL/OO70tMPD2QHEFVLS5ltM/NTmabnFPl/5/2nMDZXs
 rik3A3DjjDMbHBENc1nU/2YQZertFsCrsOLb6h5sYS66NjF4yhPdq9EQtFoECOcZ
 Y73YhZgQxdTxIHNXCsRbWDzc2fMnhp5sDSfKtBBzJDV3u1lfYiSFk9FO/4LpK/lJ
 eyu4waFaqRKtWdAJPetMRjdAwotYBQcYEIHnSGEgIA11K6f2benF4a4X6eC8gnpi
 thAMteMEc7Nu3kCEgQz+DJUlj1gQqMbsvVzt/fW5lCPZ00whD1kpvvxPVnFTQ+mw
 uQngRaBGY5sNDXV5YZoXS45ltwcLHCX5MbhKWHV1shGCKW0QxNTEQdqEzaJUw/xA
 +9lwCRYbNzzWsHJmx6CS
 =tlId
 -----END PGP SIGNATURE-----
 
 --suMD1gxl821dzEEX--

From: Jilles Tjoelker <jilles@stack.nl>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: bug-followup@FreeBSD.org, John Baldwin <jhb@freebsd.org>,
	guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Sun, 30 Jun 2013 23:28:55 +0200

 On Sun, Jun 30, 2013 at 06:12:32AM +0300, Konstantin Belousov wrote:
 > On Sat, Jun 29, 2013 at 11:53:35PM +0200, Jilles Tjoelker wrote:
 > > On Fri, 28 Jun 2013 20:45:38 +0300, Konstantin Belousov wrote:
 > As I said, libunwind relies on the signal blocking behaviour to be able
 > to unwind from the signal handler.
 
 OK :(
 
 > > I am a bit concerned, though, that this is only needed for the
 > > unthreaded programming environment. libthr has an efficient method for
 > > postponing signals that avoids system calls. Moving that mechanism to
 > > libc, although it is a bit hard, may be an option.
 
 > Well, the right answer then is, in fact, to merge libc and libthr.
 > I implemented ELF filters as the first required step, but did not
 > progressed the task further.
 
 > IMO the merge is mostly mechanical, the complication is due to the fact
 > that the work should be done in branch and takes a lot of time. As
 > result, the libc and libthr changes during development are conflicting
 > and have to be constantly resolved.
 
 A full merge would make people unhappy who want a separate unthreaded
 programming environment so that, for example, libstdc++ can allocate
 smaller data structures without locks. (Note that this requires breaking
 pthread_once() in the unthreaded programming environment.)
 
 However, even without pthread_create() and pthread_once(), a lot of
 functionality could be moved from libthr to libc, assuming we are
 willing to declare mixing and matching libc and libthr versions
 completely unsupported (by adding a check). For example, the signal
 handling, cancellation checks and errno. In the case of dynamic linking,
 a partial merge will require fewer symbols to be exported
 FBSDprivate_1.0 which reduces PLT indirection and will make up for some
 overhead.
 
 An example of this partial merge is lib/libc/gen/sem_new.c.
 
 -- 
 Jilles Tjoelker

From: Konstantin Belousov <kostikbel@gmail.com>
To: Jilles Tjoelker <jilles@stack.nl>
Cc: bug-followup@FreeBSD.org, John Baldwin <jhb@freebsd.org>,
        guillaume@morinfr.org, theraven@freebsd.org
Subject: Re: kern/131597: [kernel] c++ exceptions very slow on FreeBSD
 7.1/amd64
Date: Mon, 1 Jul 2013 03:24:06 +0300

 --tdVUlQTxnujgnAno
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Sun, Jun 30, 2013 at 11:28:55PM +0200, Jilles Tjoelker wrote:
 > On Sun, Jun 30, 2013 at 06:12:32AM +0300, Konstantin Belousov wrote:
 > > On Sat, Jun 29, 2013 at 11:53:35PM +0200, Jilles Tjoelker wrote:
 > > > On Fri, 28 Jun 2013 20:45:38 +0300, Konstantin Belousov wrote:
 > > As I said, libunwind relies on the signal blocking behaviour to be able
 > > to unwind from the signal handler.
 >=20
 > OK :(
 >=20
 > > > I am a bit concerned, though, that this is only needed for the
 > > > unthreaded programming environment. libthr has an efficient method for
 > > > postponing signals that avoids system calls. Moving that mechanism to
 > > > libc, although it is a bit hard, may be an option.
 >=20
 > > Well, the right answer then is, in fact, to merge libc and libthr.
 > > I implemented ELF filters as the first required step, but did not
 > > progressed the task further.
 >=20
 > > IMO the merge is mostly mechanical, the complication is due to the fact
 > > that the work should be done in branch and takes a lot of time. As
 > > result, the libc and libthr changes during development are conflicting
 > > and have to be constantly resolved.
 >=20
 > A full merge would make people unhappy who want a separate unthreaded
 > programming environment so that, for example, libstdc++ can allocate
 > smaller data structures without locks. (Note that this requires breaking
 > pthread_once() in the unthreaded programming environment.)
 libgcc++ could switch to the method used by all other platforms, on
 the FreeBSD too.  It seems that the presence of the pthread_cancel()
 symbol works as an indicator.  We can easily implement the switch
 locally, and I am sure that upstream will accept such change.
 
 >=20
 > However, even without pthread_create() and pthread_once(), a lot of
 > functionality could be moved from libthr to libc, assuming we are
 > willing to declare mixing and matching libc and libthr versions
 > completely unsupported (by adding a check). For example, the signal
 > handling, cancellation checks and errno. In the case of dynamic linking,
 > a partial merge will require fewer symbols to be exported
 > FBSDprivate_1.0 which reduces PLT indirection and will make up for some
 > overhead.
 We already do not support mixing different versions of libc and libthr,
 since libthr copies the pthread stubs over the libc table.
 
 >=20
 > An example of this partial merge is lib/libc/gen/sem_new.c.
 
 In fact, the export of the pthread* symbols from libc is very unfortunate.
 
 --tdVUlQTxnujgnAno
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.20 (FreeBSD)
 
 iQIcBAEBAgAGBQJR0MwlAAoJEJDCuSvBvK1BSkwP+wa9pn0DcrxOlt16Kl5bARph
 +yQriobX54RilXMBEuhB07IFxN+iECGgbXieq3iPMhY5Q925doZIjIKViFfnPWfy
 MD4dpQKxpVInL77/GOPeBzGcqNtMg/Ubfn4pKAP9Mmv+mNIjPOVOka8zzDwTCyrR
 NkNMyA4bNtUWVnb6y1yb1t2PUP3W9RTjhbC9xgfCwXzL1IwwWZwMv0+Ate75E5EY
 I7F+eOUqhCWNHUHfw4ezF7ZqB7faY9/ScGJTKbOlJ8C9h0BtnNeb/zCZXsgu4+UN
 Gt6LtpQib5eya05braKOJVaALpksQ+MHTNvWmMmodBLnjHu4fyVfBTmAwNkk+VnQ
 UaXMpGsiuGev4ebfF3ccVDVYgsUXM4mZM+RmyBoMKJ64fNtXO+esBNSBqAPH2GzK
 WE426lq+MPXj4kG5WXGiW7oDY4L1jYM8yEH+E5NL1MdEXZmDiKvPvRQjEfnaB8HJ
 9XgJCKEDSPuxD1lkmLhLzEke0SCN+r+CoDbbyL5MSPzCSQjeYxCy27iO7yg/G/65
 +wzPwLTtR2V/ujuDcYUVXCDff/Uqcs44rlcfQ9fzpOyWa0Z2bZ+7zyVs4xhwhW4Y
 Y8tV+yLXRjwPMyoUv+ODDbEZlUpob2Kj9wiVqB/9r4lwunDpDUzTq1v2+JNl9jTr
 R1ZGkOAUMKWZzxhaJqgj
 =N7Lu
 -----END PGP SIGNATURE-----
 
 --tdVUlQTxnujgnAno--
>Unformatted:
