Subj : Re: pthreads and fork
To   : comp.programming.threads
From : David Hopwood
Date : Mon Feb 21 2005 03:52 am

Marcin 'Qrczak' Kowalczyk wrote:
> Since I implement my own threads (pthreads are used only to make
> some functions asynchronous), I can decide myself about their behavior
> on fork. And since I'm designing the language to make asynchronous
> cancelation quite safe, I decided that fork should cancel other
> threads using the regular mechanism for asynchronous signals;
> in particular if a thread has signals blocked, it will continue
> for some time in the child process.
> 
> Of course not every computation is safe to be forked; if a thread
> has signals blocked and is going to write to stdout, the write might
> happen twice (and C fork has the same problem with I/O buffers).
> But I believe that cancelation is a better default than evaporation,
> at least in a language which supports asynchronous signals, exceptions
> and garbage collection. My equivalent of pthread_atfork can be used
> to make more cases safe and avoid e.g. duplicating output.

Why include fork at all in your language?

  - it is most commonly used to create a new process (i.e. fork+exec),
    but for this use it is wasteful (in memory and time, especially if
    the parent process is larger than the child) and has the wrong
    semantics (only bad things can happen in the gap between the fork
    and exec).

  - it's not portable to non-Unix OSes (e.g. good luck implementing
    anything like it on Windows).

  - supporting fork puts undesirable constraints on how other things
    can be implemented. For example you've already found that it
    interacts badly with pthreads -- suppose that you wanted to
    use pthreads more extensively to take advantage of parallelism
    on multiprocessors?

  - its interaction with threads, shared memory, and external I/O is
    just too complicated.

  - cancelling threads after fork is *not* a good default. This makes
    it essentially impossible to write fork-safe libraries that output
    to external resources: on a fork you will end up with two processes
    outputting to the resource, which will invariably trash it.
    Cancelling threads after the fork is too late; they might have had
    enough time to trash the resource before being cancelled, or they
    might do so during cleanup.

    (Evaporating threads has different problems, but they are just as bad.)

  - fork+exec can and should be replaced by popen in almost all cases.
    (The only case where they shouldn't is in programs that have called
    set[e][ug]id, but that's a minefield anyway.)

    Uses of fork without exec are so rare that they are not worth all
    of the above hassle. popen (or equivalent, e.g. Win32 CreateProcess),
    OTOH, does not create any of these problems.

  - if you really want to clone a running process, it can still be done
    by persisting the relevant state and explicitly restoring it in a
    new popen'd process. This might sound like a lot of effort, but it
    gives much more control over which resources are copied, shared, or
    otherwise recreated. This control is needed in practical situations.


fork would never have been included in C/Unix if C had been a
multithreaded language at that point. See also the rationale section at 
<http://www.opengroup.org/onlinepubs/009695399/functions/fork.html>.
An abstraction of POSIX popen / Win32 CreateProcess is what you want.

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

.