Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : Sean Kelly Date : Thu Sep 01 2005 12:57 pm Let's back up a minute. And forgive me for being pedantic--it's easier for me to follow all this if I reiterate some of what's been said in other posts. According to 7.1 of the IA-32 System Programming Guide: > Certain basic memory transactions (such as reading or writing a byte in > system memory) are always guaranteed to be handled atomically. That is, once > started, the processor guarantees that the operation will be completed before > another processor or bus agent is allowed access to the memory location. The > processor also supports bus locking for performing selected memory operations > (such as a read-modify-write operation in a shared area of memory) that > typically need to be handled atomically. So it seems clear that 'atomic' in IA-32 speak implies that (compliant) stores are made immediately globally visible. Assuming this is true, the only concern I can think of is processor reordering of reads and writes. Since (according to you) IA-32 loads have acquire semantics and IA-32 stores have release semantics, this is limited to loads occuring before preceding writes (compared to program order), which seems to be confirmed by the preamble for section 7.2. >From what I can see, the current confusion seems to be caused by this statement in 7.2.2: "Writes from the individual processors on the system bus are NOT ordered with respect to each other." However, this is clarified further on: > Individually, the processors perform the writes in the same program order, > but because of bus arbitration and other memory access mechanisms, the order > that the three processors write the individual memory locations can differ > each time the respective code sequences are executed on the processors. Thus, this section seems only presesnt to mention that relying on execution timing in a concurrent program is a bad idea. And this is so obvious I half wonder why this paragraph is even there. In any case, I suppose it bears mentioning that LOCK seems to be the intended synchronization mechanism (for operations on non-relaxed memory) as it acts as a full membar (at the memory level, just to be explicit). From 7.1.2.2: > Locked operations are atomic with respect to all other memory operations and > all externally visible events. . . For the P6 family processors, locked > operations serialize all outstanding load and store operations (that is, wait > for them to complete). This rule is also true for the Pentium 4 and Intel > Xeon processors, with one exception: load operations that reference weakly > ordered memory types. Have I got this right? Sean .