Subj : Re: lockless low-overhead 'pipes' (w/semaphores)
To   : comp.programming.threads
From : David Schwartz
Date : Wed Apr 20 2005 04:17 pm


<lindahlb@hotmail.com> wrote in message 
news:1113966844.629781.250060@g14g2000cwa.googlegroups.com...
>> I depends upon your assumed memory model. If, for example, you're
>> just
>> assuming a POSIX-compliant environment, there's no guarantee the
>> consumer
>> will read the data the producer wrote (instead of stale data stuck in
>> some
>> cache somewhere), since the consumer doesn't hold the same lock the
>> producer
>> held.

> Ok, so in the problem you iterate, the producer stores some data, which
> invariably ends up staying in a processor's local cache (say L1 or L2),

    Or in a register in the CPU. Or in something that hasn't been invented 
yet.

> and when the consumer, executing on another processor, would attempt to
> access the data, it wouldn't see the cache data located on the other
> processor? Don't today's SMP architectures guarantee visibility in
> situations like this (i.e. cache collision)?

    The answer is yes and no. There are architectures today that re-order 
stores. But the point is then that this code is architecture-dependent.

> If they do - the data is stored and THEN 'len' is updated. If this
> order is preserved, then how could the consumer see the updated 'len'
> before the 'data' has reached memory?

    Because writes can be re-ordered.

> If you're talking about out-of-order reads/writes, both compiler and
> processor level - the atomic_* functions imply a memory barrier (lock
> prefix for i486+), so thats a non-issue, right? Or does the lock prefix
> not enforce order and I need a seperate instruction to prevent
> store-store (producer) and load-store (consumer) barriers?

    They do? Is this supposed to be x86-specific code or not? You say "xadd 
for intel", implying that this code is supposed to work on other 
architectures as well.

> Also, if the lock prefix doesn't work that way I thought it did, are
> there any other locations that need types of memory barriers that I may
> be missing?

    Again, is this supposed to be x86-specific code? And is it supposed to 
rely only on what's guaranteed to work on future x86 processors?

> However, if the problem you mention exists NOT because of out-of-order
> stores/loads, and that the 'len' could reach memory, but the 'data' is
> still suck in the cache, even after a 'lock' prefix, then I don't see
> how any program executing on multiple processors could guarantee
> coherency between data on seperate processors.  If there's no way to
> gaurantee cache coherency, then synchronization would be near
> impossible, no?

    You have an absolute guarantee you have is if each thread accesses the 
data under the same lock. There are other things that also provide an 
absolute guarantee. Unless you are writing platform-specific code, you 
should *only* use the things that provide these guarantees. Otherwise your 
code is in the "might happen to work" category.

> I thought I understood this quite well, but clearly I must be missing
> something (unless I am not enforcing an order to 'len' and 'data' via
> the lock prefix).

    Again, what does what the lock prefix on x86 does have to do with 
anything? If this is x86-specific code, you need to say that, and you need 
to specify whether it's intended to be guaranteed to work on all future x86 
processors or just the ones that exist now.

    DS

.