Subj : Re: CMPXCHG timing To : comp.programming.threads From : Michael Pryhodko Date : Mon Apr 04 2005 08:38 pm > > 1. LOCK could lock system bus -- this could create significant > > influence on whole system: more devices are using bus (e.g. processors) > > -- influence becomes worse. sfence does not have this disadvantage > > (AFAIK). > > This is not true on any modern processor I know of. LOCK *!*could*!* lock system bus. That does not mean it will do it every time, but according to IA-32 Manual it *!*could*!*. And by the way, if LOCK locks cache line -- sfence does not lock anything. > > 2. AFAIK sfence performance depends on size of store buffer, that means > > in case of: > > > > sfence > > mov mem_addr, smth > > sfence > > > > second sfence will perform much faster than first. > > No. The cost of 'sfence' has to do with sequencing. In order for an > 'sfence' to work, some stores have to be before the fence and some have to > be after. This imposes a heavy cost on CPUs that get much of their > performance from out of order execution. :)) On all x86 'store' operation implemented with 'mov' instruction has 'release' semantic, i.e. every such store 'happens' only after every preceding memory access finished. This was a surprise for me month ago. That means that main 'feature' of sfence -- is to make changes visible to other processors by flushing store buffers. See IA-32 Manual for 'retirement unit', also Alexander Terekhov placed some very useful links somewhere in this discussion. That why I think that most of 'sfence' price comes from flushing store buffers. > > LOCK-based -- 1493 on average > > my lock -- 2039 on average > > my lock with cmpxchg with 'unlock' augmented to work on 2-processor pc > > -- 1045 on average > > Benchmarking these kinds of things in tight loops gives unrealistic > results. :) agreed, but they make you think about it. Try to play with this test -- I will be glad to hear any interesting news from you. Bye. Sincerely yours, Michael. .