Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : Joe Seigh Date : Sun Aug 28 2005 09:03 am David Hopwood wrote: > Joe Seigh wrote: >> Actually, no one is relying on it to work. That's what >> the wrapper macros are for. They let you add a membar if >> the implementation dependent stuff breaks. But it would be ironic >> if Intel inadvertently breaks the very stuff people are >> using to make multi-core processors more scalable. We're >> just trying to save Intel from themselves. It's not a >> correctness of implementation issue, it's a performance >> of implementation issue. > > > That's not clear at all. If dependent loads break, some lfence instructions > will have to be added. But they will only have to be added in places where > the code is actually relying on a load-load constraint, whereas the current > semantics (whatever they are :-) potentially affect the performance of > *every* > load. If Intel or AMD broke dependent loads, I assume it would be because > they'd benchmarked the change and found that there was a significant > performance > gain. I don't know what the ratio of all loads to membars is, but it's > got to > be very high, so just a tiny improvement in performance due to relaxing the > constraints on loads *could* vastly outweigh the cost of the added lfences. > Intel most likely benchmarks based on their official memory model and they'd had no way of distinquishing between normal loads and loads that rely on dependend loads for proper ordering, loads that would require LFENCE after the fact. So their projections of performance inprovement would only be based on current LFENCE usage, not future LFENCE usage which would be much greater. So the true effect of the change wouldn't be known until after the processors got changed and present software (e.g. Linux kernel) got changed to run on the new processors correctly. I'm not too worried about LFENCE now. I'm assuming a reasonably optimal implmentation will be about as expensive as a dependent load in situations where all the accesses are dependent anyway. It could be a problem if Intel implements it as a serializing instruction rather than as an ordering instructions. And MFENCE could be a problem if you're required to use it instead because of your atomic api semantics since to avoid store penalties you'd have to avoid *all* stores, even ones onto local non-shared memory. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .