[HN Gopher] Murder Mystery: GCC builds failing after sbuild refa...
___________________________________________________________________
Murder Mystery: GCC builds failing after sbuild refactoring
Author : pabs3
Score : 105 points
Date : 2024-12-22 07:31 UTC (15 hours ago)
(HTM) web link (www.linux.it)
(TXT) w3m dump (www.linux.it)
| rurban wrote:
| I had a similar murder mystery investigation the last months, and
| to rewrite the whole murder concept. But in the end, getting rid
| of signals (I hate signals ) and replacing it with messages
| didn't work, so I embraced arranging a proper group and killing
| them at once. With some exceptions.
| IshKebab wrote:
| The standard Linux process management API is a total mess IMO. It
| doesn't even offer a way to avoid TOCTOU. They've made some
| improvements recently (https://lwn.net/Articles/794707/) but as
| far as I know you still can't implement something like pstree
| without just reading files from `/proc` which is shit.
| formerly_proven wrote:
| Who needs handles for processes when you can have reusable IDs
| instead? /s
|
| For a while the limit has been around 4 million or something
| like that, instead of the traditional 32k. Which of course
| broke a bunch of stuff that assumed "char pid[5];" is good
| enough. Fun stuff.
|
| The kernel offers up this (here abbreviated) example which you
| could use for all accesses to /proc/<pid> when you have an open
| pidfd to ensure that the two refer to the same process:
| static int pidfd_metadata_fd(pid_t pid, int pidfd) {
| char path[100]; snprintf(path, sizeof(path),
| "/proc/%d", pid); int procfd = open(path,
| O_DIRECTORY | O_RDONLY | O_CLOEXEC); if
| (sys_pidfd_send_signal(pidfd, 0, NULL, 0) < 0 && errno !=
| EPERM) { close(procfd);
| procfd = -1; } return procfd;
| }
|
| And these days you even can poll for pidfds (readable = exited)
| and you can even use P_PIDFD with waitid. So the very basic
| process management cycle can be done through pidfds nowadays.
|
| > as far as I know you still can't implement something like
| pstree without just reading files from `/proc` which is shit.
|
| It's also slow! Tools like top/htop/ps -aux etc. use a
| surprising amount of CPU because they have to do a trillion
| trips to the kernel and back. Compare this to the win32
| equivalent where, iirc, there is just one or two trips to the
| kernel to acquire a snapshot and you walk that in user-space
| instead.
| alexvitkov wrote:
| char path[100]; snprintf(path, sizeof(path),
| "/proc/%d", pid);
|
| You're assuming the pid will fit in 93 bytes! The exact same
| mistake as `char pid[5];`
|
| Jokes aside, sometimes userspace deserves to be broken, and
| `char pid[5]` is IMO one of those cases. The fact that we
| treat integers as a scarce reusable resource is bad, and the
| fact that we now have a second-order process identifier to
| cover that up is worse.
| formerly_proven wrote:
| Fun fact that broken code was in the runtime used by a
| major Fortran compiler
| pjmlp wrote:
| UNIX/POSIX process management.
| josephcsible wrote:
| > It doesn't even offer a way to avoid TOCTOU.
|
| What do you mean? I admit there are cases where it's hard to
| avoid TOCTOU, but I can't think of any where it's impossible.
| IshKebab wrote:
| For example to kill all of the children of a process you
| first have to read all of the files in `/proc` to build a
| process tree, and then send your signals, by which time the
| process tree may have changed.
| maccam94 wrote:
| Yeah, as far as I know the only way to be sure is by
| putting the parent process into a cgroup. Then to kill all
| of the child processes you have to freeze the cgroup,
| enumerate the pids, send sigterm/sigkill to all of them,
| before unfreezing again.
| josephcsible wrote:
| Doesn't the new pidfd API you linked give you a race-free
| way to do that?
| IshKebab wrote:
| Maybe. Tbh I was trying to do something like that on
| RHEL8 and I think it's too old for procfd.
| o11c wrote:
| As long as you aren't trying to reimplement `systemd`
| badly, you really shouldn't have to deal with this.
| IshKebab wrote:
| "As long as you aren't trying to use the flawed API you
| won't run into any of its flaws."
|
| Ok then.
| ijustlovemath wrote:
| With TOCTOU, the solution is often just to try to use the
| resource and handle errors, not to assume that since you
| checked it, it's available.
| loeg wrote:
| PID reuse is the classic example here. There is no neat
| solution other than process handles, which Unix/Linux does
| not historically have.
| IshKebab wrote:
| That only works if you know the resource in advance. Doesn't
| work if for example you're trying to delete files matching a
| glob and you have to enumerate them by name and then delete
| them by name.
|
| The filesystem API offers a solution for that. The process
| API doesn't (maybe with procfd; I haven't tried that API yet
| since it's so new).
| Polizeiposaune wrote:
| The problem here wasn't a case of TOCTOU; rather, the program
| was using a magic constant value that was outside the range of
| getpid() but which wasn't outside of the domain of kill(2)'s
| first parameter.
| nneonneo wrote:
| Proper enum types would have solved this problem; e.g. in
| Rust enum ProcessHandle {
| Pid(pid_t), Invalid, Terminated }
|
| There is no way to pass a Terminated or Invalid process
| handle to kill, since there is no numeric identifier for it
| anymore. And, any code that wants to get a PID from a
| ProcessHandle has to explicitly handle non-PIDs - making it
| impossible to forget to deal with these special cases.
| IshKebab wrote:
| Yeah I know, that was just another example of how bad the API
| is.
| ajross wrote:
| Heh, "even". BSD process groups were around for two decades at
| least before anyone even thought of the idea of a TOCTOU
| vulnerability, much less coined an acronym for it. You're just
| asking for too much here. Yes, it's an API you probably
| shouldn't use. No, it's not going away.
|
| The solution for this particular problem is cgroups. The build
| system just wasn't using it.
| usr1106 wrote:
| Excellent writeup!
|
| Some observations unrelated to the bug.
|
| 1. Nice to see some details about the Debian build server
| implementation. I have always found it rather obscure. Compared
| to that OpenSUSE's OBS is refreshingly open. Not only can you
| just use their cloud service for free, but you can also rather
| easily set up your own instance.
|
| 2. Nice to see that the Debian build infra gets updated. I am a
| bit suprised that Perl code gets introduced in 2024. Isn't that a
| bit out of fashion for a reason: It's typically very hard to read
| code written by others. (As said unrelated. It did not cause the
| bug in TFA.)
| Kwpolska wrote:
| sbuild is written in Perl, so it's not surprising that new code
| for it is written in Perl.
| aragilar wrote:
| I believe installing https://packages.debian.org/sid/sbuild-
| debian-developer-setu... will get you most of the way (but I've
| only ever done manual builds of packages, so I may be wrong).
| chubot wrote:
| Are you surprised that programs can live for 10 or 20 years?
|
| Most people who want to rewrite things in a language that's
| more fashionable don't have the experience maintaining working
| systems over 20+ years, like Debian does
| eyegor wrote:
| I'll just pitch in as someone who's worked with several 40+
| year old codebases and 3+ year old perl, the perl code is
| significantly harder to maintain. The language lends itself
| to unreadable levels of terseness and abuse of regex. Unless
| you use perl every day, the amount of time trying to
| understand the syntax vs the code logic is skewed too heavily
| towards obscure syntax. Even the oldest fortran/c is much
| easier to work with.
|
| Except maybe arithmetic gotos designed to minimize the number
| of functions. Those are straight evil and I'm glad no modern
| language supports them.
| temporallobe wrote:
| I also maintain older codebases and this is how I feel
| about Clojure. It's far too terse and the syntax is nigh
| unreadable unless you are using it very often - add to that
| there seems to be a culture that discourages comments. OTOH
| so-called "C" languages seem much more readable even if I
| have haven't touched them in years or have never written a
| line of code in them.
|
| On another note, I have done a lot of things with Perl and
| know it well, and I agree that it can be written in a very
| unreadable way, which is of course enabled by its many
| esoteric features and uncommon flexibility in both syntax
| and methodology. It is simultaneously a high-level
| scripting language and a low-level system tool. Indeed, I
| have caught myself writing "clever" code only to come back
| years later to regret it. Instead of "TMTOWTDI"it is has
| turned into "TTMWTDI" (there's too many ways to do it).
| shermantanktop wrote:
| > Even the oldest fortran/c is much easier to work with.
|
| Could that be survivor bias? If that old Fortran was hard
| to work with, maybe it would have been rewritten, and left
| the set of "oldest code" in the process.
| dwattttt wrote:
| It's problems like these (mistaking a sentinel value for a valid
| member of a type) that make me think sentinel values are just a
| mistake. In-band signalling must have been responsible for so
| many bugs at this point.
| ajross wrote:
| It's hard though. I mean, it's easy to say you shouldn't do
| this. But that becomes isomorphic to "You shouldn't have NULL".
| And... that's hard, because you really need NULLs for a lot of
| problems. Or "You shouldn't have error code returns from
| syscalls", when you really need those.
|
| It's possible to split it out, but then everyone complains
| about the result. We thought exceptions were going to fix the
| error code problem, but then we collectively decided we hate
| them. OSes and static analyzers and instrumentation tools like
| valgrind/asan/msan have done a really good job of turning NULL
| dereference into a built-in assertion everywhere, but people
| won't use them.
|
| Even Rust, which has probably come closest to a genuine
| solution, has done so only at the cost of immense complexity
| and somewhat fugly syntax. No free lunch.
|
| I think what's notable here isn't the bug itself, which is
| pretty routine. It's the fact that simple bugs can be buried so
| deeply that they get exposed only in a six-hour-long test rig
| on one particular Linux distribution's backend.
| dwattttt wrote:
| Rust's answer is its type system, but Rust is hardly the only
| language that can do this.
|
| If a syscalls returns an fd, and on error it returns a
| negative value, why not define it as a bitfield with one bit
| a flag? Then in C there'd at least be a field name you could
| grep for, or grep for it not being there.
| paulddraper wrote:
| > Even Rust, which has probably come closest to a genuine
| solution
|
| Haskell Maybe
|
| Scala Option
|
| Java Optional
|
| These things have existed for over a decade (multiple, in the
| case of Haskell).
| crest wrote:
| All of these didn't just give you something equivalent to C
| malloc()/free() with a parameter and result type. Maybe
| monads are useful in a lot of cases, but require a safe
| type system that prevents accidentally/negligently ignoring
| errors and language features to make it easy to unwrap the
| enum variants e.g. destructuring pattern matches and a
| collections of functions to work with them. I know both
| Rust and Haskell have those and suspect the other two
| examples as well. Without them they would be a pain in the
| ass to use.
| crest wrote:
| The answer is to make enums explicit e.g. malloc should
| return either a valid allocation or an error, but you can't
| easily do that in C. It takes a richer/better type system to
| make this convenient to use and safe from accidents.
| Depending on your language and compiler the in memory
| representation for a nullable and a non-nullable pointer can
| be the same, but a well designed type system can prevent
| fallible human programmers from confusing one for the other.
| Add some destructuring pattern matching to the language the
| usability improves instead of suffers. Rust added a whole lot
| more which made the syntax very noisy.
___________________________________________________________________
(page generated 2024-12-22 23:00 UTC)