[HN Gopher] Finding the "second bug" in glibc's condition variable
___________________________________________________________________
Finding the "second bug" in glibc's condition variable
Author : ingve
Score : 246 points
Date : 2022-09-18 12:55 UTC (10 hours ago)
(HTM) web link (probablydance.com)
(TXT) w3m dump (probablydance.com)
| jart wrote:
| If you need really good synchronization primitives while Glibc is
| fixing things then I've had a lot of success with *NSYNC
| https://github.com/google/nsync which is what Cosmopolitan Libc
| uses to implement pthread_cond_t.
| jeffbee wrote:
| Its design is similar to absl::Mutex, which is available to c++
| users. They may also have been written initially by Mike
| Burrows?
| [deleted]
| userbinator wrote:
| Concurrency is bug-prone. Complex code is bug-prone. glibc's CV
| implementation combines both.
|
| In my experience, both writing and teaching concurrency, there's
| a very strong sentiment of "I'll just use another CV/lock/etc."
| whenever people encounter a bug and try to fix it, which seldom
| fully solves the problem but usually makes it even more subtle
| and compounds the number of resulting states.
|
| One of my tricks to increase chances for race conditions and
| deadlocks when testing is to use a modified sampling profiler to
| randomly delay and pause threads, but with such a long sequence
| of steps as this bug requires, that would've been unlikely to
| help. Yet Murphy's Law says that it will happen in practice...
| sbf501 wrote:
| What bothers me is that the OCaml code looks so harmless:
| static void st_masterlock_acquire(st_masterlock * m) {
| pthread_mutex_lock(&m->lock); while (m->busy) {
| m->waiters ++; custom_condvar_wait(&m->is_free,
| &m->lock); m->waiters --; }
| m->busy = 1; pthread_mutex_unlock(&m->lock); }
|
| How is this bug not more common with such a simple invocation?
| asveikau wrote:
| Probably you can get away with that simple invocation "most of
| the time", but what matters is the higher level code that calls
| it triggers the bug where other access patterns would not.
|
| Also, I do wonder why they roll their own mutex using condition
| variables. Most people probably would use a mutex and thus
| wouldn't hit the condition variable at all. Condition variables
| are more niche than mutex.
| mbac32768 wrote:
| According to Xavier Leroy (OCaml language dev):
|
| > It is possible to use a plain POSIX mutex as master lock,
| but fairness is awful.
|
| https://discuss.ocaml.org/t/is-there-a-known-recent-linux-
| lo...
| jwatte wrote:
| That kinda depends on what custom_condvar_wait() does, right?
| grogers wrote:
| Just 6 days ago they merged a custom condvar implementation
| based on futex directly to workaround the glibc bug. Before
| that it was just pthread_cond_wait
| ncmncm wrote:
| Someone should be very embarrassed about how they have handled
| this whole process.
|
| Surely code to operate condition variables correctly is in the
| public domain, that does not involve mitigations or stolen
| signals.
| pengaru wrote:
| In 2022 a POSIX OS with a broken pthread_cond_t implementation is
| a broken OS, full stop. This is _the_ POSIX API for implementing
| sleeping synchronization mechanisms in threaded programs.
|
| The glibc project should be incredibly embarrassed by this,
| assuming there's anyone even still paying attention there.
| nix23 wrote:
| Well it's gnu/linux, a multiuser unix that is not usable as a
| multiuser system (read the sdf.org story). It's a cheap hw-
| framework to run more or less reliable...but no problem we have
| kubernetes and containers to cover/perfume the stinking parts.
| ajross wrote:
| Skimming this, this isn't something glibc can do reliably. This
| has to be a kernel fix. The underlying FUTEX_OP operation is...
| obtuse. Futexes were originally designed for simple locking and
| semaphores, and only later extended to condition variables, and
| the result isn't pretty. (To be fair: it's a very hard problem on
| kernels that need to scale to hundreds of cores across many
| levels of memory interconnect.)
|
| Trying to fix this in userspace is just about tuning to try to
| suppress the underlying issues, and recovering cleanly when
| things break.
| mannykannot wrote:
| I'm not sure whether to read your comment as disputing the
| author's claim of having a fix for this particular problem, or
| that there will inevitably be other cases where condition
| variables fail unless futexes are fixed in the kernel.
| jart wrote:
| How is this the kernel's fault? The article says
| "pthread_cond_signal() will signal on the wrong futex". Yes I
| understand futexes are tricky, but I don't see how they're
| broken.
| ajross wrote:
| To be clear: I'm not assigning "fault", I'm saying the design
| is flawed. The current futex abstraction forces userspace to
| do things like "pick a thread to wake up", which is
| inherently racy when viewed from outside the kernel. So yes,
| you have to fix this particular issue in glibc because the
| microbug is in glibc. But glibc shouldn't have tried to solve
| the problem in the first place.
|
| IPC is just really hard, we struggle with exactly this in
| Zephyr too (though we have mostly managed to avoid this
| particular kind of mistake as in an RTOS the kernel/API
| boundary is thinner), at a much smaller scale. Add NUMA and
| cache-locality-concern to the mix and things get harder
| still.
| keepquestioning wrote:
| Would macOS see this issue too?
| saagarjha wrote:
| macOS uses its own pthreads implementation:
| https://github.com/apple-oss-distributions/libpthread
| mort96 wrote:
| This seems like some really good debugging effort, kudos.
|
| But, my takeaway from this is that there's a glibc bug which
| causes deadlocks in correct code, and a proposed fix (even if
| partial) has been available for years, and glibc still hasn't
| shipped a version with the fix so it's up to distros and users to
| patch the broken glibc code.
|
| Reminds me of how a glibc update broke GNU m4 and GNU didn't ship
| a fix to m4 for years, so it was up to distros and users to patch
| the latest release of GNU m4 to make it compile with the latest
| version of GNU glibc.
|
| And then a couple of years later GNU shipped another version of
| glibc which broke m4.
|
| GNU doesn't strike me as an organisation which places a lot of
| value on software quality and reliability.
| stefan_ wrote:
| It's hilarious how many of these stories there are. I remember
| debugging a DNS issue on an embedded system; the resolv.conf
| changed but the program continued trying the old DNS server.
| Turns out GLIBC will read the resolv.conf once on startup and
| never again after (unless specifically invalidated). Except
| most people will never notice this behavior because (1) most
| distributions carry a patch from 2004 that reloads it
| transparently or (2) use libraries that just always invalidate
| it, just to be sure.
|
| This stupidity surfaces everywhere; here is a Go bug report for
| it:
|
| https://github.com/golang/go/issues/21083
| rekado wrote:
| > GNU doesn't strike me as an organisation which places a lot
| of value on software quality and reliability.
|
| This stems from a misunderstanding of what GNU is. Ideally all
| projects under the GNU umbrella would work towards a unified
| operating system, but that's sadly not what it's like. The GNU
| project is not much of a project in the common sense of the
| word, nor is it much of an organisation. This is one of the
| many reasons why https://gnu.tools exists, a subset of GNU with
| common goals and values, including collaborative project
| management.
| xwolfi wrote:
| It sounds silly for deep theoricians and a somewhat
| politically active organization like GNU to have wanted to go
| all the way to an OS, the kind of software where you really
| need to throw all the beauty out of the window and solve dumb
| users tangible problems, like how to read a bluray from an
| obscure proprietary drive or how to run a game with a
| monopolistic corporation's binary drivers.
|
| GNU can never succeed at it because it can never compromise
| and maybe that's fine. It's a shame that it s considered a
| failure when they accomplished so much in the OS scaffolding
| part.
| mort96 wrote:
| I kind of get that. And if a change in GNU glibc caused a bug
| in GNU wget, I would hope that wget would release a fix
| relatively quickly, but I wouldn't necessarily have any
| special expectations just because glibc and wget are both
| part of GNU.
|
| But I wish that the projects in the GNU toolchain would at
| the very least collaborate. The combination of glibc + gcc +
| autotools + m4 makes such a core part of any GNU system that
| you'd hope that GNU cares to keep them mutually compatible.
| So when they release an update to glibc which breaks m4 which
| makes it impossible to compile the current version of the GNU
| toolchain using the current version of the GNU toolchain,
| that's extremely disappointing.
| yjftsjthsd-h wrote:
| It does naively seem like a central library like glibc
| would want to run regression integration tests on every new
| version to make sure they're not breaking the utterly
| massive ocean of software that depends on them, and that
| the very obvious first step in that direction would be to
| run tests using software from the same group and/or core
| libraries/software that they know are required for a lot of
| base systems to work (ex. GNU coreutils, GCC, clang,
| busybox). I suspect it does in fact mostly boil down to
| resourcing problems - although that just begs further
| questions, since I would expect that organizations like
| Google or Red Hat would consider that kind of thing
| important enough to support.
| dezgeg wrote:
| Using GNU Guix (the distribution) should be a logical and
| simple test-bed to see how exactly glibc/gcc pre-release
| would affect downstream packages.
| [deleted]
| userbinator wrote:
| _GNU doesn't strike me as an organisation which places a lot of
| value on software quality and reliability._
|
| That's not surprising, because their main purpose is to promote
| the GPL and "free as in freedom" software. I've heard theories
| that their software is deliberately overly complex and uniquely
| different from other implementations in order to make it easier
| to find GPL violations. In other words, they're optimising for
| something other than quality.
|
| I've inspected the glibc condition variable implementatoin
| (while looking at what may be a slightly different bug, or it
| could be another manifestation of this one; not sure at this
| point since it was years ago) and the first thing I noticed was
| that it is disturbingly complex.
| pengaru wrote:
| > That's not surprising, because their main purpose is to
| promote the GPL and "free as in freedom" software.
|
| You're conflating the FSF and the GNU project.
| userbinator wrote:
| Same difference; this is in contrast to e.g. BSD, which is
| far less politically motivated. (And AFAIK they have no
| problems in their CV implementation that I know of...)
| chrsig wrote:
| Awesome to see the author post TLA+ instructions & model source.
| One day I'll get around to actually learning it. In the meantime,
| what's posted is too involved for me to grok at first glance, but
| I'm still glad to see more real world uses pop up.
| aaaaaaaaaaab wrote:
| ComputerGuru wrote:
| I maintain my own FOSS threads signals/events libraries in C++
| [0] and in rust [1] (atop of parking lot as a futex shoe-in)
| and I disagree. This has nothing to do with the language and
| everything to do with the semantics of the locks. Writing
| concurrency primitives is HARD and the more functionality your
| API exposes, the more room there is for nuanced bugs in how
| everything interplays with everything else.
|
| [0]: https://github.com/neosmart/pevents
|
| [1]: https://github.com/neosmart/rsevents
| galangalalgol wrote:
| I'm a rust beginner, but it seems like a mutex or lock is
| shared mutable state, and abstracting this with unsafe code
| to implement events is just creating a loophole by which to
| call a function on another thread without having a mutable
| reference? I'm sure I'm missing something, or a lot of
| things. It just aeems like code I wouldn't writ in c++ even
| anymore. I typically don't do signalling between threads or
| have them share mutable state. If they do signal it is by
| read or write to a value small enough to be atomic.
| ComputerGuru wrote:
| Rust's synchronization types are built on the semantics of
| shared objects, with the synchronization object owning the
| data access to which is channeled exclusively via the
| synchronization type.
|
| Events and signals aren't intended to be used off you're
| trying to share ownership of memory or data - you should
| use the std types in that case. They're for signaling
| between threads or for building your own synchronization
| objects atop of. For example, an auto reset event can be
| used in lieu of a one-bit channel (sort of like Mpmc<()>)
| and manual reset events can be used to implement the same
| in broadcast mode, much more efficiently than the channel
| crates. A common usage is to block until an event is
| available or a shutdown has been requested - in lieu of
| manually polling an AtomicBool and aborting if it is true,
| making your code more responsive (no delay between polls)
| and more efficient (no repeated polling, no busy waits,
| etc). They can be used to signal state transitions or
| readiness, which doesn't have anything to "own", for
| example, the rsevents-extra crate [0] implements a
| countdown event (event becomes available when n outstanding
| tasks have finished in parallel threads) or a semaphore
| (restricting concurrency to a region of code, not limiting
| access to a variable or object).
|
| [0]: https://github.com/neosmart/rsevents-extra
| asveikau wrote:
| It's worth noting that condition variables are a finicky
| interface. Speaking personally, I would typically much rather
| work with something like a semaphore. I also feel like I've
| seen a lot of misuse of condition variables out there.
| ComputerGuru wrote:
| Condition variables were the POSIX equivalent to Win32's
| WaitForMultipleObjects (and IOCP apis like it). Now both
| are available on Windows (and my library makes WFMO
| available on *nix/Linux/macOS).
|
| Semaphores alone can't mimic the same semantics (juggling
| between multiple, disparate locks/sync objects) without
| running into a possible race condition unless you have an
| API that can somehow wait on one of two or more semaphores
| to become available.
| enjoy-your-stay wrote:
| WaitForMultipleObjects (been in Win32 forever) is not
| really related to IO completion Ports, whose API came a
| lot later.
|
| My understanding is that condition variables were
| designed to be a superior alternative to Windows Event
| objects.
|
| I'm glad to read here that others find them fiddly and
| overly awkward to use since that was also my impression
| vs the consistent and usually easy to use Windows thread
| synchronization HANDLE based objects. Dave Cutler
| definitely did a good job there IMHO.
| gpderetta wrote:
| Condition variables predate Windows and its predecessor
| VMS by many years (they do not predate unix itself but
| certainly they predate pthreads by decades).
|
| In particular pthreads borrowed them from Mesa, but they
| originally were part of the monitor abstraction designed
| by Hoare.
|
| So they weren't designed to be a superior alternative to
| windows events; they might be considered an alternative
| to events in general but I couldn't easily find any
| history of the events abstraction.
| asveikau wrote:
| I totally disagree with your assessment.
| WaitForMultipleObjects enters the kernel every time and
| blocks on kernel objects. The nearest equivalent is
| select, poll, epoll, kqueue, etc. Newer innovations like
| eventfd start to introduce synchronization objects into
| the mix of what type of kernel object to block on.
|
| A Win32 mutex or semaphore is very heavy and typically
| avoided in a sane high performance application unless you
| need it to cross process boundaries. A lot of the time
| (most?) you will want a win32 critical section, which is
| similar to the modern futex based pthread mutex
| implementation in that it does not enter the kernel
| unless there is contention.
|
| A pthread condition variable is clumsy. You need to give
| it an extra mutex and loop etc. It doesn't have any of
| the simplicity of WaitForMultipleObjects, even with all
| the faults of that interface of which there are many.
| ComputerGuru wrote:
| I was not talking about implementation details but
| semantics. I agree with what you are saying. My own
| implementations are user-mode unless sleeping.
| eloff wrote:
| Rust doesn't help with implementing low level primitives like
| this, you'd have to use unsafe code.
| cesarb wrote:
| > Rust doesn't help with implementing low level primitives
| like this
|
| A charitable reading of the parent comment might be that Rust
| _already implements_ these particular low-level primitives,
| so that by using Rust, you are not using the problematic
| glibc pthread code. See https://blog.rust-
| lang.org/2022/06/30/Rust-1.62.0.html#thinn... for the
| relevant announcement: "Previously, Mutex, Condvar, and
| RwLock were backed by the pthreads library on Linux. [...]
| Rust's standard library now ships with a raw futex-based
| implementation of these locks on Linux, [...]"
| jstimpfle wrote:
| In either case you need to use some implementation of
| synchronisation primitives, and that could contain bugs.
| Where's the difference?
| dundarious wrote:
| glibc's impl is a "raw futex-based" one as well. I'm no
| great fan of glibc as an organization or body of code, but
| the comparative benefits of Rust here are not well
| explained.
| tialaramex wrote:
| Although glibc's implementation is indeed "raw futex-
| based" it is not merely a raw futex, of course. What you
| get from glibc isn't a futex (tricky to use properly) but
| instead a C structure type named pthread_mutex_t. It's
| quite big. 40 bytes!
|
| All Rust wanted was a futex, which for reference is just
| any 32-bit aligned value. The pthread_mutex_t is less
| useful to Rust, yet it's ten times bigger!
|
| So, Mara's new Mutex<T> in the Linux builds of Rust is
| just a futex (32-bits) plus a poison boolean plus your
| type T which is protected by the mutex. If your type is
| Zero Size then this rounds up to 8 bytes, five times
| smaller than pthread_mutex_t. If your type was just a u16
| (unsigned 16-bit integer) then it rounds up to the same
| size, but you're also protecting your 16-bit unsigned
| integer which is nice.
|
| Now, if you're protecting huge data structures with
| mutexes, you don't care, it makes no measurable
| difference. But if you heard futex is small (it is) and
| protected loads of tiny objects with one each (say,
| individual 32-bit floating point X,Y co-ordinate pairs)
| Rust now makes your choice cost roughly what you expected
| instead of bloating your program for no reason.
|
| It's also a bit faster (but probably not enough that you
| should care) and simplifies some other book keeping in
| Rust itself.
|
| [ Above I wrote about Mutex, for Condvar the argument is
| similar but I haven't spent lots of time understanding
| the innards of Condvar whereas I spent a week reading
| Mutex stuff ]
| gpderetta wrote:
| Condvar and mutexes are for very different use cases
| though, and condvars are notoriously hard to get right,
| especially when supporting all the full set of posix
| behaviours with good performance.
| asveikau wrote:
| Which means they have plenty of opportunities to introduce
| their own bugs.
| kibwen wrote:
| That doesn't help here, as Rust also uses glibc for plenty of
| things (although recently it did move away from pthreads in
| certain areas for performance reasons, see
| https://twitter.com/m_ou_se/status/1526211117651050497 ).
| wizeman wrote:
| Just a word of caution: I've tried replacing the mitigation patch
| (which I had been using for 18 months) with the author's minimal
| fix (the first out of the 6 patches, which is supposed to fix the
| issue) and I've immediately ran into a deadlock while building
| Poly/ML 5.9 (3 times, in fact), whereas previously I hadn't
| encountered this bug before, despite having built Poly/ML many
| times.
|
| This was on a 32-core machine with lots of background compilation
| going on at the same time (i.e. with a high load).
|
| It could be a coincidence or an unrelated bug, but it's also
| possible that the author's patch might not fix the issue
| completely.
| KyleSanderson wrote:
| Apply the entire series... That's what was likely tested the
| most.
| oldtimer1901 wrote:
| Old timers will recall that when Java first became popular in the
| mid 90s it brought to light all sorts of bugs in various
| operating systems' thread implementations - particularly in
| SunOS/Solaris. It is discouraging that such basic bugs still
| exist in GNU/Linux after 25 years. But it's not surprising - if
| anyone tells you that they understand all possible multi-threaded
| scenarios and race conditions in the implementation of these
| libraries - they are mistaken. It's all trial and error.
| [deleted]
| dgan wrote:
| can we have a "formal semantics for multi-threading" ? why it's
| not a thing?
| tsimionescu wrote:
| Another poster shared the Java memory model, and C++ and C have
| them as well. The article we're discussing shows another kind
| of formal model for multithreaded code in TLA+, and actually
| uses it to investigate this bug. There is also Tony Hoare's
| Communicating Sequential Processes formal model, first
| presented in 1978; and an entire field of CS research called
| process calculus.
|
| So, pick a formal model and use it - though note that formal
| methods tend to add significant extra effort to use, and may or
| may not pay off depednig on your domain and desired outcomes.
| kjeetgill wrote:
| There are. They're called memory models. I know Java's is
| pretty readable (see below). I think C and C++ have them too
| now. But when you're building on these things there's always
| lots of room to get them wrong!
|
| [0]: https://en.wikipedia.org/wiki/Java_memory_model
|
| [1]:
| https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.ht...
| baq wrote:
| It's super duper hard, mostly.
| metadat wrote:
| Is there a way to bundle and use an more up-to-date version of
| glibc inside my docker containers? Is it a straightforward matter
| of creating an intermediate build-base-image where glibc gets
| compiled and copying over the resulting glibc .so artifact files
| to the final image?
|
| I suspect there may be additional concerns, but I'm not a glibc
| expert and have not needed to mess with it much.
|
| Also, does musl libc [0] have less bugs, on average?
|
| [0] https://musl.libc.org/
___________________________________________________________________
(page generated 2022-09-18 23:00 UTC)