Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : David Hopwood
Date : Sun Aug 28 2005 05:23 pm

Joe Seigh wrote:
> David Hopwood wrote:
>> Joe Seigh wrote:
>>
>>> Actually, no one is relying on it to work.  That's what
>>> the wrapper macros are for.  They let you add a membar if
>>> the implementation dependent stuff breaks.  But it would be ironic
>>> if Intel inadvertently breaks the very stuff people are
>>> using to make multi-core processors more scalable.  We're
>>> just trying to save Intel from themselves.  It's not a
>>> correctness of implementation issue, it's a performance
>>> of implementation issue.
>>
>> That's not clear at all. If dependent loads break, some lfence 
>> instructions will have to be added. But they will only have to be
>> added in places where the code is actually relying on a load-load
>> constraint, whereas the current semantics (whatever they are :-)
>> potentially affect the performance of *every* load. If Intel or AMD
>> broke dependent loads, I assume it would be because they'd benchmarked
>> the change and found that there was a significant performance gain.
>> I don't know what the ratio of all loads to membars is, but it's got
>> to be very high, so just a tiny improvement in performance due to 
>> relaxing the constraints on loads *could* vastly outweigh the cost
>> of the added lfences.
> 
> Intel most likely benchmarks based on their official memory model and 
> they'd had no way of distinquishing between normal loads and loads that
> rely on dependent loads for proper ordering, loads that would require
> LFENCE after the fact.  So their projections of performance inprovement
> would only be based on current LFENCE usage, not future LFENCE usage
> which would be much greater.

I would prefer Intel and AMD to benchmark based on code that actually exists
and that follows the memory model *as specified*, than to speculate about
the performance of future versions of code that doesn't currently follow
the memory model (if I'm correct that it doesn't).

> So the true effect of the change wouldn't be known until after the
> processors got changed and present software (e.g. Linux kernel) got
> changed to run on the new processors correctly.

C'est la vie.

> I'm not too worried about LFENCE now.  I'm assuming a reasonably optimal
> implmentation will be about as expensive as a dependent load in situations
> where all the accesses are dependent anyway.

Right, there's no reason why it should be any more expensive than that.

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>

.