Subj : Re: Either I'm stupid, or NPTL 0.29 on RH9 is broken!
To   : comp.programming.threads
From : Joseph Seigh
Date : Tue Jan 25 2005 12:10 pm

On 25 Jan 2005 08:44:27 -0800, Greg Law <glaw@nexwave-solutions.com> wrote:

> Is there a bug in the following code?
>
> It works fine for me on various combinations of Linux distros that I've
> tried it on, except on RH9.  After not too long, all threads except one
> grind to a halt.
>
> compiled simply with:
>
> gcc -D_GNU_SOURCE condvars.c
>
> (if you want to run it for yourself, I recommend redirecting stderr to
> /dev/null or a file; If you you look at the stderr output, you can see
> clearly that one thread is continuously signalling the condvar, and all
> other threads are blocked waiting for that c.v.!).
>
> Interestingly, if I change the "signal" to a broadcast, everything
> seems to be fine.   Also, if you stop the process and restart (via ^z
> and "fg"), it starts again for a bit, before all but one threads grind
> to a halt.  Which threads continues seems to be random.
>
> I've tried this on two different RH9 machines: a dual-Opteron machine
> (running in 32-bit mode), and a single P4 -- both show the same
> results.
>
> Assuming the code below isn't stupid, then is this a known issue in the
> shipped NTPL?
>
> Here comes the code!
(snip)

Your signaling thread is getting the lock again before any woken thread
gets a chance to get the lock.  The woken threads are probably trying
to get the lock at that point.  The locks are adaptive, not FIFO, so
the behavior you're seeing is allowed.  If you do a sched_yield before
trying to reacquire the mutex you should see this behavior go away.

-- 
Joe Seigh

.