Subj : Re: pthread_sigmask in single thread application
To   : comp.programming.threads
From : Giancarlo Niccolai
Date : Tue Jan 04 2005 01:21 pm

Harshana wrote:

> Hi all,
> 
> I have a single thread, socket based program that need to handle
> SIG_CHILD when any of its child processes exits. But signals should
> come only when its in select() loop. So I use pthread_sigmask() BLOCK
> and UNBLOCK to prevent SIG_CHLD from coming while working outside of
> select(). But recently I have noticed that signals doesn't get blocked
> with pthread_sigmask(BLOCK). I have seen this kind of behaviour when
> linked without -lpthread. But now irrespective of linking -lpthread
> library, sigmask() doesn't work. Anyone has any idea why its happening
> like this? Can anyone suggest me a solution?
> 
> I'm developing this on "SunOS dev 5.9 Generic_112233-10 sun4u sparc
> SUNW,Sun-Fire" and
> running on "SunOS cpu10 5.9 Generic_112233-10 sun4u sparc
> SUNW,Ultra-Enterprise"
> 
> Thanks in advance,
> Harshana

First of all, use pthread_sigmask instead of sigmask. Secondly, why you
would like to prevent SIG_CHILD from arriving outside the select? What
problem would cause this to you?

In a multithread program, the best choice is probably that of dedicate one
thread to receive all signals, or if the signal handling is complex, one
thread per every complex handling signal. What you have to do is:

1) block all the signals in the main thread.
2) launch the signal monitor thread
3) in the monitor thread: accept all the signals you wish to handle.
4) wait forever on a condition variable that will never be signaled...
5) ... until the end of the program; then you signal the condition, and
proceed to thread termination.

Signal receiving will force pthread_condwait to exit, so you still have to
do a full pthread condition loop pattern to make that work:

peudocode MonitorThread:

register_signals_handler()
allow_signals()

lock mutex
while ( !monitor_exit ) {
    cond_wait cond, mutex
}
cleanup
unlock mutex

return;

-------------
from the signal handler, many things are unsafe to do including locking
mutexes. This means that you should NOT lock mutexes from signal handler;
however, your threads may still be going, and this may make quite a hard
time in updating things in the other threads. However, pthread_cond_wait is
guaranteed to be interrupted when a signal is issued; this is rather
interesting, because you can check the return value of pthread_cond_wait
and in case it is interrupted, handle the update the signal handler has
done for you.

In fact, the signal handler shares the same thread of the waiter, so you can
use static global variables to communicate between the signal handler
function and the waiter. Soon after a condition wait return (upon checking
it has been interrupted), you can just:
1) block signals
2) check the thread-private variables the handler has set
3) enforce multithread update on other threads and synchronize with them
4) reaccept signals
5) loop again and wait.

This however arises another problem (uhm...): a signal may arrive AFTER you
have enabled signals and BEFORE you are waiting on the condition. This
would prevent your code from handling the signal update (that may be
handled only later on, after another signal unblocks the wait)... 

That's where a accept_signals_and_cond_wait atomic function would be a nice
addition to posix standard... the pselect (not select) function provides
for that, partially: if the file descriptor is used by another part of the
application to "signal" the pselect, then you are granted that the are
accepted only during a wait that may be interrupted also by writing to the
file descriptor. The funny thing is that pselect is not correctly
implemented everywhere. From the man page of linux pselect (2.6.10):

---------
The idea of pselect is that if one wants to wait for an event, either a
signal or  something  on  a file descriptor, an atomic test is needed to
prevent race conditions. (Suppose the signal handler sets a global flag and
returns. Then a test  of  this  global  flag followed by a call of select()
could hang indefinitely if the signal arrived just after the test but just
before the call.  On the  other hand, pselect allows one to first block
signals, handle the signals that have come in, then call pselect() with the
desired sigmask, avoiding  the race.)   
Since  Linux today does not have a pselect() system call, the current glibc2
routine still contains this race.
----------

It's more or less like saying: it's unsafe as a unblock; select; block; or
an unblock; cond_wait; block;

(the only other thing without pselect that comes ready to my mind is: add
ANOTHER thread which periodically signals the condition, changing a var
like signal_check_tick so that the waiter knows what it is expected to do,
and where signals are blocked; make the signal handler so that it stores
updates in a growable structure, i.e. a linked list; in this way, you are
sure that signals get managed in a maximum fixed time no matter what, which
should be fairly enough for any non-realtime application. Actually, you
can't get much better in MT, as you have ANYHOW some propagation time due
to the synch between the updater and all the interested threads; this
method would just add a delay in some cases, and some CPU OH, but they both
should not be significant with respect to the CPU consumption of the main
program and to the best propagation time of the update).

Giancarlo.

.