[HN Gopher] The GNU make jobserver Implementation (2015)
___________________________________________________________________
The GNU make jobserver Implementation (2015)
Author : fanf2
Score : 82 points
Date : 2024-11-26 18:42 UTC (6 days ago)
(HTM) web link (make.mad-scientist.net)
(TXT) w3m dump (make.mad-scientist.net)
| chasil wrote:
| Does the xargs -P option also work this way?
| nwellnhof wrote:
| xargs doesn't recursively invoke itself, so you don't have the
| problem that make has.
| Polizeiposaune wrote:
| The xargs command can invoke arbitrary commands, possibly
| including scripts that themselves invoke xargs, so a system
| using xargs -P recursively could very well benefit from a
| governor limiting the total number of jobs run in parallel.
| It might be unlikely or uncommon (I've certainly never seen
| it in practice) but nothing prevents it from happening.
|
| And if you're using xargs somewhere in the middle of a
| parallel build, having xargs -P respect the jobserver's
| limits would be a nice touch.
| kazinator wrote:
| > convincing myself that reads and writes of one-byte tokens to a
| pipe were atomic and robust
|
| :)
| crest wrote:
| Transfers up to PIPE_BUF are atomic on pipes according to
| POSIX.
| badmintonbaseba wrote:
| It's interesting that waiting on a signal and a pipe read at the
| same time is the hard part. Would be interesting to track down
| how this happens to be implemented if you do this from a higher
| level async framework, like python's asyncio.
| awesomerob wrote:
| I'd guess it would be similar to (if not the same as) the
| internal pipe + select() approach mentioned in the post
| (although most likely with epoll or whatever instead of
| select).
| nerdponx wrote:
| As far as I know, none of the tricky Unix details here would be
| handled by Python. The "async" part isn't the problem, it's the
| coordination around syscalls.
| badmintonbaseba wrote:
| I wonder how this fails then: import asyncio
| import os import signal import sys
| async def get_stdin_reader(): loop =
| asyncio.get_running_loop() reader =
| asyncio.StreamReader() protocol =
| asyncio.StreamReaderProtocol(reader) await
| loop.connect_read_pipe(lambda: protocol, sys.stdin)
| return reader async def amain():
| print(f'pid: {os.getpid()}') sig_queue:
| asyncio.Queue[None] = asyncio.Queue() def
| sig_handler() -> None:
| sig_queue.put_nowait(None) loop=
| asyncio.get_running_loop()
| loop.add_signal_handler(signal.SIGUSR1, sig_handler)
| reader = await get_stdin_reader() task1 =
| loop.create_task(reader.read(1)) task2 =
| loop.create_task(sig_queue.get()) pending = {task1,
| task2} while True: done, pending =
| await asyncio.wait(pending,
| return_when=asyncio.FIRST_COMPLETED) if task1
| in done: task1 =
| loop.create_task(reader.read(1))
| pending.add(task1) print('got char on
| stdin') if task2 in done:
| task2 = loop.create_task(sig_queue.get())
| pending.add(task2) print('got signal')
| asyncio.run(amain())
| aumerle wrote:
| It's not even slightly tricky, just use self pipe. I have no
| idea why the maintainer of make rejected it. He says select has
| different signatures, but select is in POSIX, so unless he is
| porting to a non POSIX platform, it's irrelevant and even if he
| is, I doubt it is that hard to write a wrapper to abstract the
| non POSIX compatible implementation of select. Then he
| complains about needing to do CLOEXEC on the self pipe. This is
| trivial (one line) on Linux using pipe2 and about 5 lines of
| code on other platforms. Given that he says make does not use
| threads its also perfectly robust without pipe2.
|
| Opting for a harder algorithm just to avoid a few lines of
| compatibility shims seems like very much the wrong tradeoff.
| gpderetta wrote:
| > so unless he is porting to a non POSIX platform
|
| Very likely gmake works on non-posix platforms. Whether there
| is a requirement that the jobserver also works there I don't
| know.
|
| It also needs to work on non-linux platforms. So a portable
| solution (or multiple non-portable ones) is needed.
| aumerle wrote:
| Yes, but this is two well known and understood functions we
| are talking about, wrapping them is really not that hard.
| badmintonbaseba wrote:
| I'm not seeing python creating a self-pipe for
| `loop.add_signal_handler`.
|
| edit: Oh, it uses `signal.set_wakeup_fd`, interesting.
|
| https://docs.python.org/3/library/signal.html#signal.set_wak.
| ..
|
| edit2: it looks like the fd comes from a self-socket, but
| yeah, it's the same approach. The function is even called
| `_make_self_pipe`.
|
| https://github.com/python/cpython/blob/bf21e2160d1dc6869fb23.
| ..
| zokier wrote:
| On Linux you got signalfd and on BSDs you got kqueue. It is
| only difficult if you are avoiding the tools made specifically
| to address the problem.
| crest wrote:
| On FreeBSD you also have process descriptors (Linux followed
| suite a while ago) which provide a race free clean interface
| to supervise processes even if you're not their reaper (so
| can't use PIDs reliably). You can add them to kqueue using
| the EVFILT_PROCDESC filter with the NOTE_EXIT flag to get
| notified when the referenced process exits. The exit status
| you would normally get from waitpid() is already in the
| struct kevent.
|
| These OS specific APIs are sometimes required and may make
| certain usecases a lot easier to implement correctly (or even
| at all), but a job server for a build system isn't one of
| those. The make jobs can be required to behave and stay in
| the process group the job server creates for them just like
| shell job control. Which can be done with just POSIX API. You
| just have to blow the dust from the relevant tombs the
| ancients left us (e.g. Advanced Programming in the UNIX
| Environment).
|
| Refuse the temptation and don't make infrastructure tools
| like gmake depend OS specific APIs. Doing so would make the
| world a worse place for everyone else.
| o11c wrote:
| PIDs are always reliable for your own direct children.
| danudey wrote:
| > don't make infrastructure tools like gmake depend OS
| specific APIs
|
| Even if you're going to implement OS-specific APIs you
| still need a general case for OSes that don't support those
| APIs or don't support them correctly. That means you still
| need to solve the original problem regardless, which then
| means that it doesn't actually save you time and energy to
| use those APIs but rather creates more development and
| maintenance work and not less.
|
| If those APIs don't provide you a significant benefit
| (reliability, performance, etc.) then it's likely not worth
| implementing them at all if the general case works, and if
| it doesn't work you have to fix it anyway.
| zokier wrote:
| > Refuse the temptation and don't make infrastructure tools
| like gmake depend OS specific APIs. Doing so would make the
| world a worse place for everyone else.
|
| Make is supposed to be the interface, implementations
| should be able to leverage all the platform capabilities
| they need, because make is part of that platform.
|
| Forcing lowest common denominator also makes world worse
| place, as does relying on single specific implementation.
| cesarb wrote:
| > On Linux you got signalfd
|
| AFAIK, make's jobserver protocol is much older than signalfd.
| Karellen wrote:
| Although the HN title says (2015), at the top of the article
| is:
|
| > June 16th, 2003 (updated September 26th, 2015)
|
| so when it was originally written signalfd(2) would not be
| available on Linux until 2.6.22 was released in July 2007, 4
| years later.
| redleader55 wrote:
| Make is awesome, but it's such a hack ... Writing and designing a
| build for a complicated project is such a pain because of the
| recursive calling.
| db48x wrote:
| Anything you want to build can be done without using recursive
| invocations of make. You should probably go ahead and
| recursively run make when building a dependency that is built
| with make, but almost any other use is a mistake.
| ajross wrote:
| Recursive make invocation isn't a required thing or a design
| point, it's not really even recommended. You see it commonly
| only in the "obvious" situation where you're building a
| dependency that has its own build integration. And there, make
| works _better_ than competing tools for precisely the reason
| that it inherits the jobserver environment and can thus make
| better use of the build hardware.
|
| I continue to believe that make is and remains the superior
| technology. Everyone else does like one or two things better
| (usually just limited to the developer-facing configuration
| language) and everything else worse.
| pornel wrote:
| Rust/Cargo has adopted this protocol: https://lib.rs/jobserver
| ajross wrote:
| Frustrating that almost nothing else has. In particular Ninja
| has been dragging its feet on this for most of a decade (they
| have an epic GitHub feature request and like six submitted
| attempts, none merged). All the "build system" components in
| the ecosystem want to be the Root of all Execution and don't
| feel like they should have to play in a sandbox with their
| competitors.
|
| Really, the older I get and the more I see of the
| cmake/ninja/Bazel/whatever world... GNU Make was a better tool
| even 20 years ago.
| rout39574 wrote:
| Everyone wants to write the build tool in their favorite
| language, and then stops when the work gets hard. :)
| danudey wrote:
| There's a lot to hate about make, but looking for good
| alternatives there are very, very few out there and none of
| them provided anything extra I was looking for.
|
| The one thing I wish were possible (trivially) in make, and I
| understand why it's not, is the ability to define a
| dependency not on whether a file exists or not but on the
| result of executing a process.
|
| For example, do I need to rebuild this docker container? The
| current method to determine that is to create a stamp file;
| create the docker container then `touch .docker-image-
| created.stamp`. Unfortunately, those stamps can become out of
| date if the docker container is removed (or old, etc.) so it
| leads to confusing situations where make's interpretation of
| the current state is separate from the reality, and in large
| complex projects where there are lots of interconnected
| parts, sometimes your only feasible route is to remove
| everything and completely rebuild from scratch.
|
| There's also an inexplicable lack of debugging output insofar
| as printing what target make is currently running. It has a
| ton of debug output, but no option I can find that says "I am
| running this target right now", "okay I'm done this target
| now" without manually adding $(info ...) statements to your
| makefile.
| oso2k wrote:
| Have you seen this debugging output doc? https://www.gnu.or
| g/software/make/manual/html_node/Parallel-...
| stabbles wrote:
| This is somewhat outdated. GNU make now uses named pipes by
| default on platforms that support it.
|
| https://www.gnu.org/software/make/manual/html_node/POSIX-Job...
|
| It's a bit more robust. For example when you have a Makefile like
| this: target: +python3 script.py $@
|
| where `script.py` itself spawns a sub-process that is jobserver
| aware (like make, gcc, cargo, ...), you don't have to worry about
| whether the fds of the python process, needed for communicating
| with the jobserver, are inherited in the subprocess.
|
| Previously you had to run `subprocess.Popen(...,
| close_fds=False)` and make sure MAKEFLAGS was inherited.
|
| Now you only have to ensure MAKEFLAGS is propagated, and that the
| sub-process can access the fifo by path (which is sometimes
| problematic, for example when running docker).
| xyzzy_plugh wrote:
| Frustratingly there is still value in the old style jobserver
| because old versions of GNU make are prevalent in the wild,
| i.e. on macOS.
|
| I'm at the point where if you're going to require users to
| install something, rather than a newer version of make, I'd
| rather require a more modern build tool and disregard make
| entirely.
| o11c wrote:
| _Any_ program that closes FDs it doesn 't know the origin of is
| fundamentally broken, and should have bugs filed against it.
|
| If "close all FDs" is a workaround for other bugs, fix _those_
| bugs too.
| int_19h wrote:
| This particular argument is about closing inherited FDs in a
| forked child process. This is a perfectly sensible thing to
| do if the child doesn't need them, and most child processes
| do not need FDs for various random files opened by the
| parent.
| o11c wrote:
| "Random files opened by the parent" were, as a rule,
| _deliberately_ given to the child. You would use CLOEXEC if
| you don 't want that.
|
| Please don't let your children trash something you gave
| them just because they don't appreciate it yet.
| stabbles wrote:
| It's the default in Python :)
| crest wrote:
| Sure but the nice thing about a named pipe is that you can,
| but don't have to get file descriptor (through inheritance or
| fd passing). Instead you can e.g. put the path to it into an
| environment variable. Sometimes that's wanted sometimes not
| _shrug_.
| abbbi wrote:
| so that comes quite right. Im currently working on a codebase
| that has evolved over 30-ish years. The makefiles are quite a
| mess and since ages, the software has been built sequentially
| which results in build times beeing much longer than required.
|
| Of course the project consists of multiple module which you
| should be able to build seperately, thus recursive calls are
| quite common amongst the makefiles.
|
| First test with only a hand full of jobs (-j) already failed in
| the beginning, i could fix these quite fast (missing makefile
| targets).
|
| Now i have the situation that on some build systems (with faster
| CPU) i still see races, where the same makfile target for a
| subproject runs at the same time and overwrites each others
| target files. On other build systems it works without any issue.
| However, ive still failed to reproduce the failure manually, it
| usually happens during automatic build invoked by jenkins or
| gitlab.
|
| Is there a way to make "make" simulate those builds so one could
| tell where the cause for the races is in detail?
| oso2k wrote:
| Is there a way to make "make" simulate those builds so one
| could tell where the cause for the races is in detail?
|
| Have you tried? make ---dry-run
___________________________________________________________________
(page generated 2024-12-02 23:01 UTC)