Subj : Re: A pity that there is no forkall() which clones threads
To   : comp.programming.threads
From : Marcin 'Qrczak' Kowalczyk
Date : Tue Mar 08 2005 08:55 pm

David Hopwood <david.nospam.hopwood@blueyonder.co.uk> writes:

> I think that's unavoidable. Uninterruptible blocking primitives like
> getaddrinfo are just broken for use in concurrent languages. Windows
> has asynchronous DNS APIs; it's about time POSIX caught up.

On Linux in theory there is getaddrinfo_a, but:

1. Its implementation is broken: a signal doesn't interrupt waiting
   for the result as it's documented that it should. The code assumes
   that pthread_cond_wait will fail with EINTR when a signal is
   delivered, in reality there doesn't even have to be a spurious
   wakeup, this error will not happen in any case.

2. It's only for name->IP, there is no getnameinfo_a.

3. The implementation is not able to cancel ongoing requests (the
   API  allows it to return an error code "not cancellable" and the
   implementation usually does) unless they haven't yet been passed
   to a worker thread yet.

4. It doesn't provide a way to wait for several kinds of sources of
   events in parallel, only waiting for several getaddrinfo_a calls.
   Well, AFAIR it can raise a signal after finishing. Signals are
   inconvenient.

5. It's implemented in terms of getaddrinfo and pthreads, so it can do
   nothing more than using them by hand (as I currently do).

What is really missing is the ability to multiplex more than I/O and
timeout, e.g. mixing them with waitpid, pthread_cond_wait, grouping
of such events etc. This would allow to write event loops which don't
have to rely on OS threads for so many things.

>> No, a Kogut signal doesn't necessarily imply cancellation. And even
>> in case it does, it's not immediate - it's propagated as an exception.
>> There is no Kogut API which could be mapped to pthread_cancel.
>
> Yes, but it's possible to implement signals in terms of cancellation.
> Suppose that you have a pool of pthreads that only run foreign functions,
> separate from the Kogut interpreter. When you need to interrupt a foreign
> function with a signal, cancel that pthread, and signal the corresponding
> Kogut thread (which is blocked waiting for the foreign function to finish).
> The fact that you have lost a pthread from the pool isn't a problem
> because they are interchangeable -- new pthreads will be created as
> necessary to run more foreign code.

Close, but unfortuantely it doesn't fit the current design. Yes,
there is a pool of OS threads, but for running unbound Kogut threads.

A thread becomes bound when it calls C which calls back to Kogut: in
this case all further foreign calls from this Kogut thread will be
made by this particular OS thread which will not be used for other
Kogut threads. This is for libraries which use pthread_getspecific,
and to ensure that C stack frames of foreign calls made by other Kogut
threads will not block the current stack frame when this callback
wants to return. So it would be impossible to interrupt a thread when
this C call is nested within a Kogut callback started from another
C call.

Even in case this is the first call to C of this thread, it may
continue a complex computation, perhaps with callbacks, from within
the C fragment which has been signalled while doing a blocking C call.
Since this happens in a single C snippet, it can't be put aside and
resumed from a different OS thread. This means that in no case a call
could be interrupted, unless the API specified an interruptible C
function to be called only as the last significant thing in a block of
C code.

Code generation would have to be changed to accommodate restarting a
thread just after a C call, not only at the beginning of a function.
This means some note when a C snippet is embedded in Kogut code - it
can't be provided as a C macro or something because it influences the
generated code.

Lots of changes for only a partial result...

> With a little care, callbacks from foreign functions into Kogut can also
> be handled correctly, although that's enough to give anyone a headache.

I don't see how they could be handled. The cancellation handler would
have to include the rest of the computation after all callbacks return.
During this time the thread might even be interrupted again.

Meanwhile I finally implemented interruption of selected calls by Unix
signals (currently used by WaitForProcess with SIGUSR2). I managed to
avoid pthread_getspecific in a signal handler. Caveats:

- During the blocking call, using the same signal for other purposes
  (i.e. accepting it to be delivered to the process) might cause it to
  be delayed until the next timer tick, instead of being processed
  almost immediately (at the next function call).

  This is because I had to avoid faking the stack overflow condition
  at delivery of such signal, which is normally used to interrupt
  normal computation. I had to avoid it because more than one OS
  thread has such signal unblocked at the same time and any of them
  could handle it. Concurrent access to the stack limit variable from
  multiple threads is too risky, as I have no means to synchronize
  with a signal handler run by another thread.

- For the same reason memory is sometimes not properly synchronized
  between threads. This applies only to the array of pending signals
  and the flag that a signal is pending. There is only a short time
  after it may be improperly accessed (i.e. modified and then read in
  another thread without synchronization) and before these threads
  actually synchronize.

  If the foreign call spends most of its time blocking, and in the
  used implementation of pthreads blocked threads don't handle signals
  directed to the whole process if there is another thread which has
  them unblocked, then the window of the possibility of such improper
  memory synchronization is narrow.

- Between the blocked thread is sent a Kogut signal and it actually
  receives the Unix signal used for interruption, this Unix signal
  sent to the whole process may be lost. This window can take several
  timer ticks if there are many running threads.

  I could probably make it narrower (between pthread_kill and
  the physical signal handler) at the cost of violating memory
  synchronization.

Ugh. I don't know if it's possible to handle that reliably.

So you can now interrupt WaitForProcess with a regular Kogut signal
and it doesn't interfere with fork, but using SIGUSR2 at the same time
might not work.

pthreads until Linux 2.0 used SIGUSR1 and SIGUSR2 for themselves
(since 2.2 they use realtime signals), interrupting WaitForProcess
will surely break on them. I wonder whether I should care about that
ancient kernels.

>> In particular in the case of forking the signal causes suspension and
>> later in one of the processes the computation continues, so interruption
>> can't be made with pthread_cancel.
>
> pthread_cancel isn't needed when forking; in that case you are only
> relying on evaporation-on-fork and Kogut signals.

I can implement the evaporable state, but it would not be necessary if
blocking C calls could be interrupted by Kogut signals. And indeed now
some of them can (if a Unix signal can interrupt them).

Since evaporable state would be currently only used for getaddrinfo,
I'm not sure if it's worth the effort. ForkProcessKillThreads doesn't
need it, and getaddrinfo will return after a few seconds anyway.

> Random, possibly unworkable idea: maybe you can have a hidden pthread
> that just does a wait() on all child processes in a loop, and stores
> the results in a shared structure? This pthread would be started the
> first time WaitForProcess is used, or the first time it is used in the
> child after a fork.

This would break on broken pthread implementations where only the
thread which did the fork can wait for the children, i.e. Linux 2.2
and earlier. Currently they are done by the same thread if it's bound
(in particular the main thread is bound).

Otherwise it would probably work. It would be more complex than the
current implementation, in particular it would have to reimplement
process searching by process ID or process group ID and the WUNTRACED
flag. I'm afraid it's not that much better over the current scheme
of interruption by a signal to be worth doing - it's specific to
WaitForProcess so the effort won't amortize over other syscalls, and
requires reimplementing too much things which belong to the system.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

.