Subj : Re: A pity that there is no forkall() which clones threads To : comp.programming.threads From : Marcin 'Qrczak' Kowalczyk Date : Tue Mar 08 2005 08:55 pm David Hopwood writes: > I think that's unavoidable. Uninterruptible blocking primitives like > getaddrinfo are just broken for use in concurrent languages. Windows > has asynchronous DNS APIs; it's about time POSIX caught up. On Linux in theory there is getaddrinfo_a, but: 1. Its implementation is broken: a signal doesn't interrupt waiting for the result as it's documented that it should. The code assumes that pthread_cond_wait will fail with EINTR when a signal is delivered, in reality there doesn't even have to be a spurious wakeup, this error will not happen in any case. 2. It's only for name->IP, there is no getnameinfo_a. 3. The implementation is not able to cancel ongoing requests (the API allows it to return an error code "not cancellable" and the implementation usually does) unless they haven't yet been passed to a worker thread yet. 4. It doesn't provide a way to wait for several kinds of sources of events in parallel, only waiting for several getaddrinfo_a calls. Well, AFAIR it can raise a signal after finishing. Signals are inconvenient. 5. It's implemented in terms of getaddrinfo and pthreads, so it can do nothing more than using them by hand (as I currently do). What is really missing is the ability to multiplex more than I/O and timeout, e.g. mixing them with waitpid, pthread_cond_wait, grouping of such events etc. This would allow to write event loops which don't have to rely on OS threads for so many things. >> No, a Kogut signal doesn't necessarily imply cancellation. And even >> in case it does, it's not immediate - it's propagated as an exception. >> There is no Kogut API which could be mapped to pthread_cancel. > > Yes, but it's possible to implement signals in terms of cancellation. > Suppose that you have a pool of pthreads that only run foreign functions, > separate from the Kogut interpreter. When you need to interrupt a foreign > function with a signal, cancel that pthread, and signal the corresponding > Kogut thread (which is blocked waiting for the foreign function to finish). > The fact that you have lost a pthread from the pool isn't a problem > because they are interchangeable -- new pthreads will be created as > necessary to run more foreign code. Close, but unfortuantely it doesn't fit the current design. Yes, there is a pool of OS threads, but for running unbound Kogut threads. A thread becomes bound when it calls C which calls back to Kogut: in this case all further foreign calls from this Kogut thread will be made by this particular OS thread which will not be used for other Kogut threads. This is for libraries which use pthread_getspecific, and to ensure that C stack frames of foreign calls made by other Kogut threads will not block the current stack frame when this callback wants to return. So it would be impossible to interrupt a thread when this C call is nested within a Kogut callback started from another C call. Even in case this is the first call to C of this thread, it may continue a complex computation, perhaps with callbacks, from within the C fragment which has been signalled while doing a blocking C call. Since this happens in a single C snippet, it can't be put aside and resumed from a different OS thread. This means that in no case a call could be interrupted, unless the API specified an interruptible C function to be called only as the last significant thing in a block of C code. Code generation would have to be changed to accommodate restarting a thread just after a C call, not only at the beginning of a function. This means some note when a C snippet is embedded in Kogut code - it can't be provided as a C macro or something because it influences the generated code. Lots of changes for only a partial result... > With a little care, callbacks from foreign functions into Kogut can also > be handled correctly, although that's enough to give anyone a headache. I don't see how they could be handled. The cancellation handler would have to include the rest of the computation after all callbacks return. During this time the thread might even be interrupted again. Meanwhile I finally implemented interruption of selected calls by Unix signals (currently used by WaitForProcess with SIGUSR2). I managed to avoid pthread_getspecific in a signal handler. Caveats: - During the blocking call, using the same signal for other purposes (i.e. accepting it to be delivered to the process) might cause it to be delayed until the next timer tick, instead of being processed almost immediately (at the next function call). This is because I had to avoid faking the stack overflow condition at delivery of such signal, which is normally used to interrupt normal computation. I had to avoid it because more than one OS thread has such signal unblocked at the same time and any of them could handle it. Concurrent access to the stack limit variable from multiple threads is too risky, as I have no means to synchronize with a signal handler run by another thread. - For the same reason memory is sometimes not properly synchronized between threads. This applies only to the array of pending signals and the flag that a signal is pending. There is only a short time after it may be improperly accessed (i.e. modified and then read in another thread without synchronization) and before these threads actually synchronize. If the foreign call spends most of its time blocking, and in the used implementation of pthreads blocked threads don't handle signals directed to the whole process if there is another thread which has them unblocked, then the window of the possibility of such improper memory synchronization is narrow. - Between the blocked thread is sent a Kogut signal and it actually receives the Unix signal used for interruption, this Unix signal sent to the whole process may be lost. This window can take several timer ticks if there are many running threads. I could probably make it narrower (between pthread_kill and the physical signal handler) at the cost of violating memory synchronization. Ugh. I don't know if it's possible to handle that reliably. So you can now interrupt WaitForProcess with a regular Kogut signal and it doesn't interfere with fork, but using SIGUSR2 at the same time might not work. pthreads until Linux 2.0 used SIGUSR1 and SIGUSR2 for themselves (since 2.2 they use realtime signals), interrupting WaitForProcess will surely break on them. I wonder whether I should care about that ancient kernels. >> In particular in the case of forking the signal causes suspension and >> later in one of the processes the computation continues, so interruption >> can't be made with pthread_cancel. > > pthread_cancel isn't needed when forking; in that case you are only > relying on evaporation-on-fork and Kogut signals. I can implement the evaporable state, but it would not be necessary if blocking C calls could be interrupted by Kogut signals. And indeed now some of them can (if a Unix signal can interrupt them). Since evaporable state would be currently only used for getaddrinfo, I'm not sure if it's worth the effort. ForkProcessKillThreads doesn't need it, and getaddrinfo will return after a few seconds anyway. > Random, possibly unworkable idea: maybe you can have a hidden pthread > that just does a wait() on all child processes in a loop, and stores > the results in a shared structure? This pthread would be started the > first time WaitForProcess is used, or the first time it is used in the > child after a fork. This would break on broken pthread implementations where only the thread which did the fork can wait for the children, i.e. Linux 2.2 and earlier. Currently they are done by the same thread if it's bound (in particular the main thread is bound). Otherwise it would probably work. It would be more complex than the current implementation, in particular it would have to reimplement process searching by process ID or process group ID and the WUNTRACED flag. I'm afraid it's not that much better over the current scheme of interruption by a signal to be worth doing - it's specific to WaitForProcess so the effort won't amortize over other syscalls, and requires reimplementing too much things which belong to the system. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ .