Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : Joe Seigh Date : Thu Aug 04 2005 08:41 am Mayan Moudgill wrote: > Joe Seigh wrote: > >>> >> MESI is required to maintain coherency so that only one >> processor has a cache line exclusive at a time and thus >> keeping updates from getting lost. If you didn't have >> false sharing, you wouldn't need this. >> >> The big problem is with the strong part of strongly >> coherent. When you have more cache traffic than you >> need due to repeated refreshes of invalidated cache >> lines, that has to slow things down. Hardware designers >> don't get it. >> > > I'm not quite sure I get what you mean. In the MESI protocol, a shared > line will get invalidated when some other processor wants to write to > the line (E or S->I). This means that if a processor is writing to an I > line, then it has to fetch the new value. That is unavoidable (other > than the case of false sharing). [Caveat: if the line in the other core > is E but not M, then the data does not have to be fetched, but you still > have to wait for the ack] > > Now, where do you see an opportunity to optimize the transaction? > (again, excluding false sharing) There is no reason not to allow stale reads since most memory models allow it. A stale read is much faster than a read stalled by a cache line refresh. Though since RCU is now a defacto standard you need new unspecified memory barriers to efficiently support it. > > One optimization which has been talked about (though I don't know > whether any processor actually implements it) is to allow loads (and > dependent ops) to execute speculatively against an I line. If it turns > out that the line was E then the speculative ops are committed, if it > turns out it was M, then they are restarted. Transactional memory? Sun is probably going to do it. Somebody leaked that in the solaris newsgroups. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .