Subj : Re: lockless low-overhead 'pipes' (w/semaphores) To : comp.programming.threads From : David Schwartz Date : Wed Apr 20 2005 04:17 pm wrote in message news:1113966844.629781.250060@g14g2000cwa.googlegroups.com... >> I depends upon your assumed memory model. If, for example, you're >> just >> assuming a POSIX-compliant environment, there's no guarantee the >> consumer >> will read the data the producer wrote (instead of stale data stuck in >> some >> cache somewhere), since the consumer doesn't hold the same lock the >> producer >> held. > Ok, so in the problem you iterate, the producer stores some data, which > invariably ends up staying in a processor's local cache (say L1 or L2), Or in a register in the CPU. Or in something that hasn't been invented yet. > and when the consumer, executing on another processor, would attempt to > access the data, it wouldn't see the cache data located on the other > processor? Don't today's SMP architectures guarantee visibility in > situations like this (i.e. cache collision)? The answer is yes and no. There are architectures today that re-order stores. But the point is then that this code is architecture-dependent. > If they do - the data is stored and THEN 'len' is updated. If this > order is preserved, then how could the consumer see the updated 'len' > before the 'data' has reached memory? Because writes can be re-ordered. > If you're talking about out-of-order reads/writes, both compiler and > processor level - the atomic_* functions imply a memory barrier (lock > prefix for i486+), so thats a non-issue, right? Or does the lock prefix > not enforce order and I need a seperate instruction to prevent > store-store (producer) and load-store (consumer) barriers? They do? Is this supposed to be x86-specific code or not? You say "xadd for intel", implying that this code is supposed to work on other architectures as well. > Also, if the lock prefix doesn't work that way I thought it did, are > there any other locations that need types of memory barriers that I may > be missing? Again, is this supposed to be x86-specific code? And is it supposed to rely only on what's guaranteed to work on future x86 processors? > However, if the problem you mention exists NOT because of out-of-order > stores/loads, and that the 'len' could reach memory, but the 'data' is > still suck in the cache, even after a 'lock' prefix, then I don't see > how any program executing on multiple processors could guarantee > coherency between data on seperate processors. If there's no way to > gaurantee cache coherency, then synchronization would be near > impossible, no? You have an absolute guarantee you have is if each thread accesses the data under the same lock. There are other things that also provide an absolute guarantee. Unless you are writing platform-specific code, you should *only* use the things that provide these guarantees. Otherwise your code is in the "might happen to work" category. > I thought I understood this quite well, but clearly I must be missing > something (unless I am not enforcing an order to 'len' and 'data' via > the lock prefix). Again, what does what the lock prefix on x86 does have to do with anything? If this is x86-specific code, you need to say that, and you need to specify whether it's intended to be guaranteed to work on all future x86 processors or just the ones that exist now. DS .