[HN Gopher] Way too many ways to wait on a child process with a ...
___________________________________________________________________
Way too many ways to wait on a child process with a timeout
Author : broken_broken_
Score : 83 points
Date : 2024-11-10 23:01 UTC (2 days ago)
(HTM) web link (gaultier.github.io)
(TXT) w3m dump (gaultier.github.io)
| nf3 wrote:
| FWIW io_uring does have support for waitid.
|
| https://www.man7.org/linux/man-pages/man3/io_uring_prep_wait...
| broken_broken_ wrote:
| Many thanks! I have added it to the article in due form now.
| EdSchouten wrote:
| An interesting aspect of waitid is that it allows you to access
| the full exit code of the process (i.e., the entire int instead
| of just the bottom 8 bits).
|
| Unfortunately, many operating systems implement waitid() on top
| of one of the older APIs, meaning the top bits get lost
| regardless...
| xchip wrote:
| Thanks for this great article, it is going to be very useful for
| my project. I am currently developing an open source Android
| native app that invokes rsync when a file gets closed (ie: you
| take a picture)
|
| https://github.com/aguaviva/Syncy
| nasretdinov wrote:
| So many ways and no-one mentioned threads..?
|
| Edit: by threads I mean creating a new thread to wait for the
| process, and then kill the process after a certain timeout if the
| process hasn't terminated. I guess I'm spoiled by Go...
| zbentley wrote:
| The threading approach is roughly:
|
| 1. Start a thread
|
| 2. That thread starts a child process and signals "started" by
| storing its PID somewhere globally-visible (and hopefully
| atomic/lock-protected).
|
| 3. The thread then blocks in wait(2), taking advantage of its
| non-main-thread-ness to avoid some signals and optionally
| masking/ignoring some more.
|
| 4. When the process exits, the thread can write
| exitstatus/"completed" to the globally-visible state next to
| PID. The thread then exits.
|
| 3. External observers wait for the process with a timeout by
| attempting to join the thread with a timeout. If the timeout
| occurs, they can access the globally-visible PID and send a
| signal to it.
|
| This is missing from the article (EDIT: it has since been
| added, thanks!). That doesn't mean it's a good solution on many
| platforms. It's more costly in resources (thread stack), more
| code than most of the listed options, vulnerable to PID-reuse
| problems that can cause a killsignal to go to the wrong
| process, likely plays poorly with spawning methods that request
| a SIGCHLD be sent to the parent on exit (and plays poorly with
| signals in general if any customization is needed there), and
| is probably often slower than most of TFA's alternatives as
| well, both due to syscall count and pessimal thread/scheduler
| switching conditions. Additionally, it multiplexes/composes to
| large numbers of processes poorly and with a high resource
| cost.
|
| EDIT: Golang's version of this is less bad than described
| above, but not perfect. Go's spawning infrastructure mitigates
| resource cost (goroutines/segmented stacks are not as heavy as
| threads), is vulnerable to PID-reuse (as are most platforms'
| operations in this area), addresses the SIGCHLD risk through
| the runtime and signal channels, and mitigates slowness with a
| very good scheduler. For multiplexing, I would assume (but I
| have not verified) that the Go runtime is internally using
| pidfds/kqueue where supported. Where not supported, I would
| assume Go is internally tracking spawn requests through its
| stdlib, handling SIGCHLD, and has a single global routine
| calling wait(2) without a specific PID, waking goroutines
| waiting on a watched PID when it comes out of the call to
| wait(2).
| broken_broken_ wrote:
| Thanks for the suggestion, I have added a short section about
| threads.
| nasretdinov wrote:
| Thanks. I believe that Go indeed _could_ use those APIs to
| wait for the child more efficiently if they chose to, but the
| current implementation suggests that they're just calling
| wait4() in a separate thread: https://cs.opensource.google/go
| /go/+/refs/tags/go1.23.3:src/...
|
| To be fair, in Go process spawning is very inefficient to
| begin with, since it requires lots of runtime coordination to
| not mess with the threads/goroutines state during fork, so
| running wait4() in a separate thread (although the thread can
| be re-used afterwards) is not the biggest concern here.
| machine_coffee wrote:
| Lol, author's thought process mirrored mine as I read the
| article, as I was reading I was thinking, 'doesn't kqueue support
| that?... and then a section on kqueue. Then I was thinking to
| myself, so how does the Linux implementation do it then?... was
| just about to start trawling the source code when 'A
| parenthesis..'
|
| Great article. Sorry to say though, Windows does manage all this
| in a more consistent way - but I guess they had the benefit of a
| clean slate.
| silon42 wrote:
| signalfd / process descriptiors are the Windows style
| mechanism... what is missing are a few things like 'spawn' that
| returns a fd directly (eliminating races...)
| blibble wrote:
| there is no race from the parent
|
| the pid will not be reused until you either handle sigchld or
| wait
| JackSlateur wrote:
| What is the meaning of this code ? void
| on_sigchld(int sig) { (void)sig; }
| naruhodo wrote:
| If it's C code, that is the way to suppress a compiler warning
| about sig being unused. In C++ you can omit (or comment-out)
| the parameter name, e.g.: // C++ void
| on_sigchld(int /*sig*/) {}
| JackSlateur wrote:
| Thank you
|
| So this would be a way which predates' C23's maybe_unused
| attribute1
|
| Nice trick
|
| [1] https://en.cppreference.com/w/c/language/attributes/maybe
| _un...
| kevin_thibedeau wrote:
| That is K&R C syntax, supported up to C18. The solution to
| tools emitting unwanted diagnostics is not to appease them
| with pointless cruft but to shut off the diagnostic:
| -Wall -Wextra -Wno-unused-parameter
| moron123 wrote:
| Parenting 101
| eduction wrote:
| He mentions Bryan Cantrill in there and I can't resist posting
| his famous epoll/kqueue rant:
|
| https://youtu.be/l6XQUciI-Sc?t=3643
|
| I know this is related but maybe someone smarter than me can
| explain how closely it relates (or doesn't) to this issue which
| seems more general (iirc Cantrill was talking about fs events not
| child processes generally)
| o11c wrote:
| > Because the Linux kernel coalesces SIGCHLD (and other signals),
| the only way to reliably determine if a monitored process has
| exited, is to loop through all PIDs registered by any kqueue when
| we receive a SIGCHLD. This involves many calls to waitid(2) and
| may have a negative performance impact.
|
| This is somewhat wrong. To speed things up in the happy case
| (where we are the only part of the program that is spawning
| children), you can just do a `WNOHANG` wait for _any_ child
| first, and check if it 's one of the children we care about. Only
| if it's an unknown child do you have to do the full loop (of
| course, if you only have a couple of children the loop may be
| better).
| akira2501 wrote:
| > I would prefer extending poll to support things other than file
| descriptors, instead of converting everything a file descriptor
| to be able to use poll.
|
| Why? The ability to block on these descriptors as a one off
| rather than wrapping into a poll makes them extremely useful and
| avoids the race issues that exist with signal handlers and other
| non-blocking mechanisms.
|
| signalfd, timerfd, eventfd, userfaultfd, pidfd are all great
| applications of this strategy.
| greggyb wrote:
| Not so much about timeouts, but related in that it is based
| around managing children processes:
|
| The lineage of tools descending from daemontools for service
| management is worth exploring:
|
| daemontools: http://cr.yp.to/daemontools.html
|
| runit: https://smarden.org/runit/
|
| s6: https://skarnet.org/software/s6/
|
| dinit: https://davmac.org/projects/dinit/
| adrianmonk wrote:
| Tenth Approach: fork() two processes.
|
| Child 1 exec()s the command.
|
| Child 2 does this: signal(SIGALRM,
| alarm_handler); alarm(timeout_length); pause();
| exit(0);
|
| Start both children, then call wait(), which blocks until _any_
| child exits and returns the pid of the child that exited. If it
| 's the command child, then your command finished. If it's the
| other child, then the timeout expired.
|
| Now that one child has exited, kill() the other child with
| SIGTERM and reap it by calling wait() again.
|
| All of this assumes you'll only have these two children going,
| but if you're writing a small exponential backoff command retry
| utility, that should be OK.
___________________________________________________________________
(page generated 2024-11-13 23:01 UTC)