Subj : Re: lockless low-overhead 'pipes' (w/semaphores) To : comp.programming.threads From : Joe Seigh Date : Wed Apr 20 2005 07:39 am On 19 Apr 2005 20:14:04 -0700, wrote: > Ok, so in the problem you iterate, the producer stores some data, which > invariably ends up staying in a processor's local cache (say L1 or L2), > and when the consumer, executing on another processor, would attempt to > access the data, it wouldn't see the cache data located on the other > processor? Don't today's SMP architectures guarantee visibility in > situations like this (i.e. cache collision)? Cache is transparent. You can't even tell it's there except for performance effects. And you cannot write a multi-threaded program whose correctness depends on the presence or not of cache (except for Alpha which is no longer signficant). It's as relevant to the issue of correctness as the color of your computer case. And if you or anybody starts talking about the color of your computer case, the rest of us aren't going to take you or them too seriously. > > If they do - the data is stored and THEN 'len' is updated. If this > order is preserved, then how could the consumer see the updated 'len' > before the 'data' has reached memory? > > If you're talking about out-of-order reads/writes, both compiler and > processor level - the atomic_* functions imply a memory barrier (lock > prefix for i486+), so thats a non-issue, right? Or does the lock prefix > not enforce order and I need a seperate instruction to prevent > store-store (producer) and load-store (consumer) barriers? If you are going to program at that level you should become more familiar with what memory ordering mechanism the architecture provides. > > Also, if the lock prefix doesn't work that way I thought it did, are > there any other locations that need types of memory barriers that I may > be missing? > > However, if the problem you mention exists NOT because of out-of-order > stores/loads, and that the 'len' could reach memory, but the 'data' is > still suck in the cache, even after a 'lock' prefix, then I don't see > how any program executing on multiple processors could guarantee > coherency between data on seperate processors. If there's no way to > gaurantee cache coherency, then synchronization would be near > impossible, no? > > I thought I understood this quite well, but clearly I must be missing > something (unless I am not enforcing an order to 'len' and 'data' via > the lock prefix). > > Could you please elaborate? I really would appreciate the help, thanks. > > It's hard to tell what you're doing wrong since you're using a non standard api. It does appear though that you're assuming that if one thread orders its memory accesses with memory ordering mechanisms, that any other thread will see those memory accesses in the same order without having to use any memory ordering mechanisms itself. This is not true on most memory models. So if one threads has to do A and B to memory in that order, then it needs a memory barrier between A and B. And if another thread has to see B and then A in that order, then it needs a memory barrier between B and A. And we haven't gotten into the issues of atomicity, word tearing, forward progress, etc... yet. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .