Subj : Re: Memory visibility and the C compiler... ( a bit long )
To   : comp.programming.threads
From : Eric Sosman
Date : Fri Jan 14 2005 03:16 pm



SenderX wrote:
>>SenderX wrote:
>>[...]
>>
>>>>Reordering across "critical instructions" (inline asm
>>>>and/or external asm functions) is still possible (and is in fact
>>>>desirable as long as it doesn't break anything).
>>>
>>>Are you talking about how the C compiler can reorder calls to external
>>>assembled functions?
>>
>>Yes. For example, an external assembled function with the effect of
>>
>>  int f(...) {
>>    return 42;
>>  }
>>
>>can be reordered in all the ways you can imagine. Oder?
> 
> I see. I wonder how any pthread implementation gets around this fact? Most
> pthread impls already use processor specific assembly to implement their
> mutex api's. Could a compiler reorder variables around calls to a
> lock-unlock function pair in a way that could mess things up?

    The compiler is a part of the Pthreads implementation,
and must refrain from doing things that would violate the
Pthreads semantics.  For most compilers this is not too
difficult: the things that happen in pthread_mutex_lock(),
say, are modelled well by C's notions of side-effects and
sequence points.  Since the compiler doesn't usually "know"
what global data an external function might read or write,
it won't take the risk of reordering accesses to those data
across function calls.

    Things get tougher if some or all of the Pthreads functions
are visible to the compiler (perhaps as inline functions), or
if aggressive optimization is done at what used to be called
"link time."  In such cases the compiler might be fooled: if
it "knows" that the only object touched by pthread_mutex_lock()
is the mutex, it might decide to move the lock call to a point
after the access to the object the lock protects -- after all,
Pthreads makes no explicit connection between the lock and the
protected object, so as far as the compiler can tell they are
independent.

    In such an implementation, the Pthreads functions need
to be flagged in some implementation-magical way to prevent
unacceptable optimizations.  There'll be some kind of signal
to the compiler -- perhaps a #pragma or something of the sort --
declaring that despite appearances, pthread_mutex_lock() should
be treated as if it read and wrote all the accessible memory,
so the optimizer dare not move the actual call out from between
its nominal sequence points.

    When you're trying to roll your own synchronization with
something like sequences of asm() code inserted inline amid the
C code, you need to know how to prevent the compiler from being
too aggressive.  For compiler X you may simply "know" that asm()
code never gets rearranged; for compiler Y you may need to
utter the magic _Pragma to prevent the rearrangement.  This is
one of the reasons it's difficult to rebuild your kernel with
a different C compiler ...

-- 
Eric.Sosman@sun.com

.