Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : David Hopwood Date : Sun Aug 28 2005 05:23 pm Joe Seigh wrote: > David Hopwood wrote: >> Joe Seigh wrote: >> >>> Actually, no one is relying on it to work. That's what >>> the wrapper macros are for. They let you add a membar if >>> the implementation dependent stuff breaks. But it would be ironic >>> if Intel inadvertently breaks the very stuff people are >>> using to make multi-core processors more scalable. We're >>> just trying to save Intel from themselves. It's not a >>> correctness of implementation issue, it's a performance >>> of implementation issue. >> >> That's not clear at all. If dependent loads break, some lfence >> instructions will have to be added. But they will only have to be >> added in places where the code is actually relying on a load-load >> constraint, whereas the current semantics (whatever they are :-) >> potentially affect the performance of *every* load. If Intel or AMD >> broke dependent loads, I assume it would be because they'd benchmarked >> the change and found that there was a significant performance gain. >> I don't know what the ratio of all loads to membars is, but it's got >> to be very high, so just a tiny improvement in performance due to >> relaxing the constraints on loads *could* vastly outweigh the cost >> of the added lfences. > > Intel most likely benchmarks based on their official memory model and > they'd had no way of distinquishing between normal loads and loads that > rely on dependent loads for proper ordering, loads that would require > LFENCE after the fact. So their projections of performance inprovement > would only be based on current LFENCE usage, not future LFENCE usage > which would be much greater. I would prefer Intel and AMD to benchmark based on code that actually exists and that follows the memory model *as specified*, than to speculate about the performance of future versions of code that doesn't currently follow the memory model (if I'm correct that it doesn't). > So the true effect of the change wouldn't be known until after the > processors got changed and present software (e.g. Linux kernel) got > changed to run on the new processors correctly. C'est la vie. > I'm not too worried about LFENCE now. I'm assuming a reasonably optimal > implmentation will be about as expensive as a dependent load in situations > where all the accesses are dependent anyway. Right, there's no reason why it should be any more expensive than that. -- David Hopwood .