Subj : Re: Memory Barriers, Compiler Optimizations, etc.
To   : comp.programming.threads
From : David Schwartz
Date : Sun Feb 06 2005 12:50 pm


"Scott Meyers" <Usenet@aristeia.com> wrote in message 
news:MPG.1c6fc390ebc1c3ed9897c2@news.hevanet.com...

> On Sat, 5 Feb 2005 17:26:11 -0800, David Schwartz wrote:

>>     This is a meaningless requirement because it doesn't say *where* the
>> order needs to be preserved. One could argue that an L2 cache violates 
>> this
>> requirment and the C standard requires you to disable the L2 cache for
>> volatile accesses. The problem is that the standard simply calls the 
>> order
>> of such accesses part of the 'oberservable behavior' of the program with 
>> no
>> concept of how or where such a thing is to be observed.

> I'd imagine it's the observable behavior of the abstract machine, not any
> real machine, since the entire standard involves only an abstract machine.

    Exactly. The problem is, real machines don't have a (single, 
well-defined) point from which they can be observed. This makes the 
observation requirement meaningless.

> My take would be if x and y are volatile and the generated code (program
> order) accesses x before y, the compiler is off the hook, regardless of
> what happens on any real machine at runtime.

    Nonsense. The C++ standard applies to the entire machine, not just the 
compiler. A compiler would not be conforming to the C++ standard if it 
generated code that might actually confrom (in the sense of an abstract 
machine) on some hypothetical hardware, the system as a whole complies if 
the compiler generates conforming code when run on a particular piece of 
hardware.

    The standard is about an abstract machine, not compiled code. In fact, 
it doesn't even require the compiler to generate object code at all. It just 
requires certain particular results.

> After all, single
> threaded-programs will always behave as if x is accessed before y,
> regardless of what the hardware does (at least that's my understanding),

    The sentence above may or may not be true, but it's definitely not about 
the C++ standard. The C++ standard is not modified by how particular 
hardware implementations act. It's either comprehensible in terms of an 
abstract machine or it's not. The observation requirement for volatile 
accesses is, quite literally, incomprehensible in terms of an abstract 
machine.

> and the standards have no concept of more than one thread.  I can imagine
> programmers wanting stronger guarantees, but I can't imagine compiler
> writers offering weaker guarantees.  Are there real compiliers where use 
> of
> volatile does not have the effect of totally ordering accesses in the
> generated code to volatile data?

    The standard is not about the generated code itself, it's about what the 
generated code does when it's run on the hardware. You cannot have a 
conforming C++ compiler whose target is "no hardware in particular". The C++ 
standard is in terms of an abstract machine and a conforming compiler must 
conform on some particular piece of hardware.

    One could argue that hardware on which the concept of observability of 
memory accesses is impossible makes it impossible to write a conforming C++ 
compiler. On modern x86 systems, you *cannot* enforce the order of volatile 
variable accesses in the sense that the C++ standard appears to require. 
However, you can't just arbitrarily pick one point in the implementation and 
say "ahh, that's where the C++ standard was talking about observing, between 
the compiler and the processor executing the compiled code" because between 
the processor and the memory controller is an equally valid point of 
observation.

    DS

.