Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : David Hopwood Date : Thu Sep 01 2005 12:39 am Joe Seigh wrote: > Joe Seigh wrote: >> David Hopwood wrote: >>> Joe Seigh wrote: >>> >>>> We're talking about whether PC implies loads (not stores) are in >>>> order or not. >>> >>> I don't even know what "loads are in order" would mean. What >>> Alexander was claiming was that on x86 load implies load.acq. This >>> *is* consistent with what the P4 manual says, insofar as the manual >>> makes sense. >>> >> I just notice this in the Itanium System Architecture manual. >> >> 2.1.2 Loads and Stores >> In the Itanium architecture, a load instruction has either unordered >> or acquire semantics while a store instruction has either unordered or >> release semantics. By using acquire loads (ld.acq) and release stores >> (st.rel), the memory reference stream of an Itanium-based program can >> be made to operate according to the IA-32 ordering model. The Itanium >> architecture uses this behavior to provide IA-32 compatibility. That is, >> an Itanium acquire load is equivalent to an IA-32 load and an Itanium >> release store is equivalent to an IA-32 store, from a memory ordering >> perspective. > > Somebody on comp.arch has convinced me this is not so, that Itanium is > emulating a stronger memory model to be on the safe side. While it could be emulating a stronger model in principle, I'm now fairly confident that it isn't, at least not to any significant extent. Caveat: the definition of RCpc is complicated, particularly the condition about "special" accesses, and I'm not sure that I understand it as well as I'd like. However, as much as I do understand is not inconsistent with PC being equivalent to RCpc with implicit release-after-store and acquire- before-load. Both the x86 and Itanium models allow the "read your own writes early" optimization. Of course, the *implementation* of any particular Itanium chip may be stronger than the Itanium model. However using store.rel and load.acq in the Itanium model does not appear to be stronger than necessary to emulate ordinary accesses in the x86 model. > Plain ld and st.rel won't work due to the IA-32 loads not being totally > unordered like Itanium loads are. Some of the IA-32 loads can have ordering > contraints due to processor consistency. Hmm.. Bad wording in the Itanium > manual also. > > So IA-32 loads are for all practical purposes out-of-order, i.e. > not in-order. I still don't like your "in-order" terminology (especially since the Intel and AMD manuals use "in-order" and "out-of-order" for properties of the implementation). Anyway, it appears that you are wrong here. > IA-32 stores are in order. > And you need membars when you need to order loads relative to loads > or other stores. "Relative to loads", no. "Or other stores", yes. -- David Hopwood .