Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Joe Seigh
Date : Sun Aug 28 2005 09:03 am

David Hopwood wrote:
> Joe Seigh wrote:
>> Actually, no one is relying on it to work.  That's what
>> the wrapper macros are for.  They let you add a membar if
>> the implementation dependent stuff breaks.  But it would be ironic
>> if Intel inadvertently breaks the very stuff people are
>> using to make multi-core processors more scalable.  We're
>> just trying to save Intel from themselves.  It's not a
>> correctness of implementation issue, it's a performance
>> of implementation issue.
> 
> 
> That's not clear at all. If dependent loads break, some lfence instructions
> will have to be added. But they will only have to be added in places where
> the code is actually relying on a load-load constraint, whereas the current
> semantics (whatever they are :-) potentially affect the performance of 
> *every*
> load. If Intel or AMD broke dependent loads, I assume it would be because
> they'd benchmarked the change and found that there was a significant 
> performance
> gain. I don't know what the ratio of all loads to membars is, but it's 
> got to
> be very high, so just a tiny improvement in performance due to relaxing the
> constraints on loads *could* vastly outweigh the cost of the added lfences.
> 

Intel most likely benchmarks based on their official memory model and they'd
had no way of distinquishing between normal loads and loads that rely on
dependend loads for proper ordering, loads that would require LFENCE after
the fact.  So their projections of performance inprovement would only be
based on current LFENCE usage, not future LFENCE usage which would be much
greater.  So the true effect of the change wouldn't be known until after the
processors got changed and present software (e.g. Linux kernel) got changed
to run on the new processors correctly.

I'm not too worried about LFENCE now.  I'm assuming a reasonably optimal
implmentation will be about as expensive as a dependent load in situations
where all the accesses are dependent anyway.  It could be a problem if
Intel implements it as a serializing instruction rather than as an ordering
instructions.  And MFENCE could be a problem if you're required to use it
instead because of your atomic api semantics since to avoid store penalties
you'd have to avoid *all* stores, even ones onto local non-shared memory.


-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.