Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Sean Kelly
Date : Thu Sep 01 2005 12:57 pm

Let's back up a minute.  And forgive me for being pedantic--it's easier
for me to follow all this if I reiterate some of what's been said in
other posts.  According to 7.1 of the IA-32 System Programming Guide:

> Certain basic memory transactions (such as reading or writing a byte in
> system memory) are always guaranteed to be handled atomically. That is, once
> started, the processor guarantees that the operation will be completed before
> another processor or bus agent is allowed access to the memory location.  The
> processor also supports bus locking for performing selected memory operations
> (such as a read-modify-write operation in a shared area of memory) that
> typically need to be handled atomically.

So it seems clear that 'atomic' in IA-32 speak implies that (compliant)
stores are made immediately globally visible.  Assuming this is true,
the only concern I can think of is processor reordering of reads and
writes.  Since (according to you) IA-32 loads have acquire semantics
and IA-32 stores have release semantics, this is limited to loads
occuring before preceding writes (compared to program order), which
seems to be confirmed by the preamble for section 7.2.

>From what I can see, the current confusion seems to be caused by this
statement in 7.2.2: "Writes from the individual processors on the
system bus are NOT ordered with respect to each other."  However, this
is clarified further on:

> Individually, the processors perform the writes in the same program order,
> but because of bus arbitration and other memory access mechanisms, the order
> that the three processors write the individual memory locations can differ
> each time the respective code sequences are executed on the processors.

Thus, this section seems only presesnt to mention that relying on
execution timing in a concurrent program is a bad idea.  And this is so
obvious I half wonder why this paragraph is even there.  In any case, I
suppose it bears mentioning that LOCK seems to be the intended
synchronization mechanism (for operations on non-relaxed memory) as it
acts as a full membar (at the memory level, just to be explicit).  From
7.1.2.2:

> Locked operations are atomic with respect to all other memory operations and
> all externally visible events. . . For the P6 family processors, locked
> operations serialize all outstanding load and store operations (that is, wait
> for them to complete). This rule is also true for the Pentium 4 and Intel
> Xeon processors, with one exception: load operations that reference weakly
> ordered memory types.

Have I got this right?


Sean

.