[HN Gopher] The GNU make jobserver Implementation (2015)
       ___________________________________________________________________
        
       The GNU make jobserver Implementation (2015)
        
       Author : fanf2
       Score  : 82 points
       Date   : 2024-11-26 18:42 UTC (6 days ago)
        
 (HTM) web link (make.mad-scientist.net)
 (TXT) w3m dump (make.mad-scientist.net)
        
       | chasil wrote:
       | Does the xargs -P option also work this way?
        
         | nwellnhof wrote:
         | xargs doesn't recursively invoke itself, so you don't have the
         | problem that make has.
        
           | Polizeiposaune wrote:
           | The xargs command can invoke arbitrary commands, possibly
           | including scripts that themselves invoke xargs, so a system
           | using xargs -P recursively could very well benefit from a
           | governor limiting the total number of jobs run in parallel.
           | It might be unlikely or uncommon (I've certainly never seen
           | it in practice) but nothing prevents it from happening.
           | 
           | And if you're using xargs somewhere in the middle of a
           | parallel build, having xargs -P respect the jobserver's
           | limits would be a nice touch.
        
       | kazinator wrote:
       | > convincing myself that reads and writes of one-byte tokens to a
       | pipe were atomic and robust
       | 
       | :)
        
         | crest wrote:
         | Transfers up to PIPE_BUF are atomic on pipes according to
         | POSIX.
        
       | badmintonbaseba wrote:
       | It's interesting that waiting on a signal and a pipe read at the
       | same time is the hard part. Would be interesting to track down
       | how this happens to be implemented if you do this from a higher
       | level async framework, like python's asyncio.
        
         | awesomerob wrote:
         | I'd guess it would be similar to (if not the same as) the
         | internal pipe + select() approach mentioned in the post
         | (although most likely with epoll or whatever instead of
         | select).
        
         | nerdponx wrote:
         | As far as I know, none of the tricky Unix details here would be
         | handled by Python. The "async" part isn't the problem, it's the
         | coordination around syscalls.
        
           | badmintonbaseba wrote:
           | I wonder how this fails then:                 import asyncio
           | import os       import signal       import sys
           | async def get_stdin_reader():           loop =
           | asyncio.get_running_loop()           reader =
           | asyncio.StreamReader()           protocol =
           | asyncio.StreamReaderProtocol(reader)           await
           | loop.connect_read_pipe(lambda: protocol, sys.stdin)
           | return reader                     async def amain():
           | print(f'pid: {os.getpid()}')                  sig_queue:
           | asyncio.Queue[None] = asyncio.Queue()           def
           | sig_handler() -> None:
           | sig_queue.put_nowait(None)                  loop=
           | asyncio.get_running_loop()
           | loop.add_signal_handler(signal.SIGUSR1, sig_handler)
           | reader = await get_stdin_reader()                  task1 =
           | loop.create_task(reader.read(1))           task2 =
           | loop.create_task(sig_queue.get())           pending = {task1,
           | task2}           while True:               done, pending =
           | await asyncio.wait(pending,
           | return_when=asyncio.FIRST_COMPLETED)               if task1
           | in done:                   task1 =
           | loop.create_task(reader.read(1))
           | pending.add(task1)                   print('got char on
           | stdin')               if task2 in done:
           | task2 = loop.create_task(sig_queue.get())
           | pending.add(task2)                   print('got signal')
           | asyncio.run(amain())
        
         | aumerle wrote:
         | It's not even slightly tricky, just use self pipe. I have no
         | idea why the maintainer of make rejected it. He says select has
         | different signatures, but select is in POSIX, so unless he is
         | porting to a non POSIX platform, it's irrelevant and even if he
         | is, I doubt it is that hard to write a wrapper to abstract the
         | non POSIX compatible implementation of select. Then he
         | complains about needing to do CLOEXEC on the self pipe. This is
         | trivial (one line) on Linux using pipe2 and about 5 lines of
         | code on other platforms. Given that he says make does not use
         | threads its also perfectly robust without pipe2.
         | 
         | Opting for a harder algorithm just to avoid a few lines of
         | compatibility shims seems like very much the wrong tradeoff.
        
           | gpderetta wrote:
           | > so unless he is porting to a non POSIX platform
           | 
           | Very likely gmake works on non-posix platforms. Whether there
           | is a requirement that the jobserver also works there I don't
           | know.
           | 
           | It also needs to work on non-linux platforms. So a portable
           | solution (or multiple non-portable ones) is needed.
        
             | aumerle wrote:
             | Yes, but this is two well known and understood functions we
             | are talking about, wrapping them is really not that hard.
        
           | badmintonbaseba wrote:
           | I'm not seeing python creating a self-pipe for
           | `loop.add_signal_handler`.
           | 
           | edit: Oh, it uses `signal.set_wakeup_fd`, interesting.
           | 
           | https://docs.python.org/3/library/signal.html#signal.set_wak.
           | ..
           | 
           | edit2: it looks like the fd comes from a self-socket, but
           | yeah, it's the same approach. The function is even called
           | `_make_self_pipe`.
           | 
           | https://github.com/python/cpython/blob/bf21e2160d1dc6869fb23.
           | ..
        
         | zokier wrote:
         | On Linux you got signalfd and on BSDs you got kqueue. It is
         | only difficult if you are avoiding the tools made specifically
         | to address the problem.
        
           | crest wrote:
           | On FreeBSD you also have process descriptors (Linux followed
           | suite a while ago) which provide a race free clean interface
           | to supervise processes even if you're not their reaper (so
           | can't use PIDs reliably). You can add them to kqueue using
           | the EVFILT_PROCDESC filter with the NOTE_EXIT flag to get
           | notified when the referenced process exits. The exit status
           | you would normally get from waitpid() is already in the
           | struct kevent.
           | 
           | These OS specific APIs are sometimes required and may make
           | certain usecases a lot easier to implement correctly (or even
           | at all), but a job server for a build system isn't one of
           | those. The make jobs can be required to behave and stay in
           | the process group the job server creates for them just like
           | shell job control. Which can be done with just POSIX API. You
           | just have to blow the dust from the relevant tombs the
           | ancients left us (e.g. Advanced Programming in the UNIX
           | Environment).
           | 
           | Refuse the temptation and don't make infrastructure tools
           | like gmake depend OS specific APIs. Doing so would make the
           | world a worse place for everyone else.
        
             | o11c wrote:
             | PIDs are always reliable for your own direct children.
        
             | danudey wrote:
             | > don't make infrastructure tools like gmake depend OS
             | specific APIs
             | 
             | Even if you're going to implement OS-specific APIs you
             | still need a general case for OSes that don't support those
             | APIs or don't support them correctly. That means you still
             | need to solve the original problem regardless, which then
             | means that it doesn't actually save you time and energy to
             | use those APIs but rather creates more development and
             | maintenance work and not less.
             | 
             | If those APIs don't provide you a significant benefit
             | (reliability, performance, etc.) then it's likely not worth
             | implementing them at all if the general case works, and if
             | it doesn't work you have to fix it anyway.
        
             | zokier wrote:
             | > Refuse the temptation and don't make infrastructure tools
             | like gmake depend OS specific APIs. Doing so would make the
             | world a worse place for everyone else.
             | 
             | Make is supposed to be the interface, implementations
             | should be able to leverage all the platform capabilities
             | they need, because make is part of that platform.
             | 
             | Forcing lowest common denominator also makes world worse
             | place, as does relying on single specific implementation.
        
           | cesarb wrote:
           | > On Linux you got signalfd
           | 
           | AFAIK, make's jobserver protocol is much older than signalfd.
        
           | Karellen wrote:
           | Although the HN title says (2015), at the top of the article
           | is:
           | 
           | > June 16th, 2003 (updated September 26th, 2015)
           | 
           | so when it was originally written signalfd(2) would not be
           | available on Linux until 2.6.22 was released in July 2007, 4
           | years later.
        
       | redleader55 wrote:
       | Make is awesome, but it's such a hack ... Writing and designing a
       | build for a complicated project is such a pain because of the
       | recursive calling.
        
         | db48x wrote:
         | Anything you want to build can be done without using recursive
         | invocations of make. You should probably go ahead and
         | recursively run make when building a dependency that is built
         | with make, but almost any other use is a mistake.
        
         | ajross wrote:
         | Recursive make invocation isn't a required thing or a design
         | point, it's not really even recommended. You see it commonly
         | only in the "obvious" situation where you're building a
         | dependency that has its own build integration. And there, make
         | works _better_ than competing tools for precisely the reason
         | that it inherits the jobserver environment and can thus make
         | better use of the build hardware.
         | 
         | I continue to believe that make is and remains the superior
         | technology. Everyone else does like one or two things better
         | (usually just limited to the developer-facing configuration
         | language) and everything else worse.
        
       | pornel wrote:
       | Rust/Cargo has adopted this protocol: https://lib.rs/jobserver
        
         | ajross wrote:
         | Frustrating that almost nothing else has. In particular Ninja
         | has been dragging its feet on this for most of a decade (they
         | have an epic GitHub feature request and like six submitted
         | attempts, none merged). All the "build system" components in
         | the ecosystem want to be the Root of all Execution and don't
         | feel like they should have to play in a sandbox with their
         | competitors.
         | 
         | Really, the older I get and the more I see of the
         | cmake/ninja/Bazel/whatever world... GNU Make was a better tool
         | even 20 years ago.
        
           | rout39574 wrote:
           | Everyone wants to write the build tool in their favorite
           | language, and then stops when the work gets hard. :)
        
           | danudey wrote:
           | There's a lot to hate about make, but looking for good
           | alternatives there are very, very few out there and none of
           | them provided anything extra I was looking for.
           | 
           | The one thing I wish were possible (trivially) in make, and I
           | understand why it's not, is the ability to define a
           | dependency not on whether a file exists or not but on the
           | result of executing a process.
           | 
           | For example, do I need to rebuild this docker container? The
           | current method to determine that is to create a stamp file;
           | create the docker container then `touch .docker-image-
           | created.stamp`. Unfortunately, those stamps can become out of
           | date if the docker container is removed (or old, etc.) so it
           | leads to confusing situations where make's interpretation of
           | the current state is separate from the reality, and in large
           | complex projects where there are lots of interconnected
           | parts, sometimes your only feasible route is to remove
           | everything and completely rebuild from scratch.
           | 
           | There's also an inexplicable lack of debugging output insofar
           | as printing what target make is currently running. It has a
           | ton of debug output, but no option I can find that says "I am
           | running this target right now", "okay I'm done this target
           | now" without manually adding $(info ...) statements to your
           | makefile.
        
             | oso2k wrote:
             | Have you seen this debugging output doc? https://www.gnu.or
             | g/software/make/manual/html_node/Parallel-...
        
       | stabbles wrote:
       | This is somewhat outdated. GNU make now uses named pipes by
       | default on platforms that support it.
       | 
       | https://www.gnu.org/software/make/manual/html_node/POSIX-Job...
       | 
       | It's a bit more robust. For example when you have a Makefile like
       | this:                   target:             +python3 script.py $@
       | 
       | where `script.py` itself spawns a sub-process that is jobserver
       | aware (like make, gcc, cargo, ...), you don't have to worry about
       | whether the fds of the python process, needed for communicating
       | with the jobserver, are inherited in the subprocess.
       | 
       | Previously you had to run `subprocess.Popen(...,
       | close_fds=False)` and make sure MAKEFLAGS was inherited.
       | 
       | Now you only have to ensure MAKEFLAGS is propagated, and that the
       | sub-process can access the fifo by path (which is sometimes
       | problematic, for example when running docker).
        
         | xyzzy_plugh wrote:
         | Frustratingly there is still value in the old style jobserver
         | because old versions of GNU make are prevalent in the wild,
         | i.e. on macOS.
         | 
         | I'm at the point where if you're going to require users to
         | install something, rather than a newer version of make, I'd
         | rather require a more modern build tool and disregard make
         | entirely.
        
         | o11c wrote:
         | _Any_ program that closes FDs it doesn 't know the origin of is
         | fundamentally broken, and should have bugs filed against it.
         | 
         | If "close all FDs" is a workaround for other bugs, fix _those_
         | bugs too.
        
           | int_19h wrote:
           | This particular argument is about closing inherited FDs in a
           | forked child process. This is a perfectly sensible thing to
           | do if the child doesn't need them, and most child processes
           | do not need FDs for various random files opened by the
           | parent.
        
             | o11c wrote:
             | "Random files opened by the parent" were, as a rule,
             | _deliberately_ given to the child. You would use CLOEXEC if
             | you don 't want that.
             | 
             | Please don't let your children trash something you gave
             | them just because they don't appreciate it yet.
        
           | stabbles wrote:
           | It's the default in Python :)
        
           | crest wrote:
           | Sure but the nice thing about a named pipe is that you can,
           | but don't have to get file descriptor (through inheritance or
           | fd passing). Instead you can e.g. put the path to it into an
           | environment variable. Sometimes that's wanted sometimes not
           | _shrug_.
        
       | abbbi wrote:
       | so that comes quite right. Im currently working on a codebase
       | that has evolved over 30-ish years. The makefiles are quite a
       | mess and since ages, the software has been built sequentially
       | which results in build times beeing much longer than required.
       | 
       | Of course the project consists of multiple module which you
       | should be able to build seperately, thus recursive calls are
       | quite common amongst the makefiles.
       | 
       | First test with only a hand full of jobs (-j) already failed in
       | the beginning, i could fix these quite fast (missing makefile
       | targets).
       | 
       | Now i have the situation that on some build systems (with faster
       | CPU) i still see races, where the same makfile target for a
       | subproject runs at the same time and overwrites each others
       | target files. On other build systems it works without any issue.
       | However, ive still failed to reproduce the failure manually, it
       | usually happens during automatic build invoked by jenkins or
       | gitlab.
       | 
       | Is there a way to make "make" simulate those builds so one could
       | tell where the cause for the races is in detail?
        
         | oso2k wrote:
         | Is there a way to make "make" simulate those builds so one
         | could tell where the cause for the races is in detail?
         | 
         | Have you tried?                  make ---dry-run
        
       ___________________________________________________________________
       (page generated 2024-12-02 23:01 UTC)