Subj : Re: [x86] CMPXCHG timing
To   : comp.programming.threads
From : Michael Pryhodko
Date : Thu Mar 31 2005 05:33 pm

> > I understand that this question is quite platform-specific and can
be
> > considered offtopic here, but...
> >
> > suppose situation:
> > 1. x86 platform (i.e. P6, Xeon and so on)
> > 2. there are two different memory locations M1 and M2
> > 3. M1 cached in processor P1 cache
> > 4. M2 cached in processor P2 cache (P2 is second processor)
> > 5. M1 = 0 and M2 = 0
> >
> > CMPXCHG &M1, 0, 1
> > CMPXCHG &M2, 0, 1
> >
> > Can it be safely assumed that execution time for instruction above
will
> > be the same? (considering that first instruction will be executed
on
> > P1, second -- on P2)
>
> On average they'll probably be the same.  Why would it matter?  If
> you're using timing dependent logic, your logic is wrong.

Unfortunately I need precise answer :). And not every timing logic is
wrong.


> > // store buffers are empty
> > mov aligned_mem_addr, value_32bit
> > sfence // to flush store buffers
> >
> > does NOT depends on aligned_mem_addr value? Any 'memory bank',
'DIMM
> > module' issues?
>
> No.  Cache line state will affect it.  But again, why should you
care?

I am trying to "delay" one processor until others will finish "write
and flush store buffers" with "write to another memory location" (all
memory locations are cached).
Could you be more specific on "cache line state"? I wonder how "flush
store buffer" performed if it contains only one "store" -- is it
constant time "to generate and send message to cache, wait for cache
coherency mechanism to finish" or something else?

Basically I have built sophisticated ultra-fast lock which works on
x86, but I need some guarantees from hardware to prove its correctness.
Maybe I should post idea here?

Bye.
Sincerely yours, Michael.

.