966 Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : Seongbae Park Date : Thu Sep 01 2005 09:40 pm Peter Dimov wrote: > Seongbae Park wrote: >> Peter Dimov wrote: > >> > X = 0, Y = 0 >> > >> > CPU1: >> > >> > st.rel X 1 >> > ld.acq Y >> > >> > CPU2: >> > >> > st.rel Y 1 >> > ld.acq X >> > >> > It is possible for CPU1 and CPU2 to both load 0. This can't happen in a >> > TSO model (*), because one of the two stores must execute first. >> > -- >> > (*) I don't know for sure whether SPARC-TSO is really TSO, though. >> >> You'd better define what TSO is then. > > Total store ordering, i.e. stores are observed by all processors in the > same order. That's just a definition for store atomicity and TSO is not only about store atomicity. .... > Right, because a store by itself is not observed by a CPU, but consider > the following slight modification: > > P1: st 1, X; ld X, r1; ld Y, r2 > P2: st 1, Y; ld Y, r3; ld X, r4 > > If the stores complete in an order that is the same for all processors, > X,Y for example, then r4 must be 1. That's because r1 and r3 are > obviously 1 (because of single thread constraints) and since the store > of Y has been observed by P2, it follows that the store of X must be > observed as well. What you described above is not what happens in TSO. TSO allows loads to return the value stored by earlier store from the same processor before the store is performed [1]. i.e. before "st 1,X" is performed, "ld X,r1" can return the value of X while the store is in the processor's store buffer. Same for P2. In other words, strictly speaking, TSO doesn't guarantee store atomicity for all cases - stores from its own processor might be observed differently by itself than everybody else. Hence, r2==0 and r4==0 is still possible on TSO. > Under x86, r2 == r4 == 0 is still possible, because P1 and P2 are > allowed to observe the stores in a different order. There is no total > order on stores. Please see my other posting for one correct example to distinguish TSO and PC (hence store atomicity). [1] The specification in v9 (Appendix D of SPARC v9 arch manual) isn't quite clear on this point, but D.4.5 implies this. -- #pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/" . 0