Subj : Re: lockless low-overhead 'pipes' (w/semaphores)
To   : comp.programming.threads
From : Joe Seigh
Date : Wed Apr 20 2005 07:39 am

On 19 Apr 2005 20:14:04 -0700, <lindahlb@hotmail.com> wrote:

> Ok, so in the problem you iterate, the producer stores some data, which
> invariably ends up staying in a processor's local cache (say L1 or L2),
> and when the consumer, executing on another processor, would attempt to
> access the data, it wouldn't see the cache data located on the other
> processor? Don't today's SMP architectures guarantee visibility in
> situations like this (i.e. cache collision)?

Cache is transparent.  You can't even tell it's there except for
performance effects.  And you cannot write a multi-threaded program
whose correctness depends on the presence or not of cache (except for
Alpha which is no longer signficant).  It's as relevant to the issue
of correctness as the color of your computer case.  And if you or
anybody starts talking about the color of your computer case, the
rest of us aren't going to take you or them too seriously.

>
> If they do - the data is stored and THEN 'len' is updated. If this
> order is preserved, then how could the consumer see the updated 'len'
> before the 'data' has reached memory?
>
> If you're talking about out-of-order reads/writes, both compiler and
> processor level - the atomic_* functions imply a memory barrier (lock
> prefix for i486+), so thats a non-issue, right? Or does the lock prefix
> not enforce order and I need a seperate instruction to prevent
> store-store (producer) and load-store (consumer) barriers?

If you are going to program at that level you should become more familiar
with what memory ordering mechanism the architecture provides.

>
> Also, if the lock prefix doesn't work that way I thought it did, are
> there any other locations that need types of memory barriers that I may
> be missing?
>
> However, if the problem you mention exists NOT because of out-of-order
> stores/loads, and that the 'len' could reach memory, but the 'data' is
> still suck in the cache, even after a 'lock' prefix, then I don't see
> how any program executing on multiple processors could guarantee
> coherency between data on seperate processors.  If there's no way to
> gaurantee cache coherency, then synchronization would be near
> impossible, no?
>
> I thought I understood this quite well, but clearly I must be missing
> something (unless I am not enforcing an order to 'len' and 'data' via
> the lock prefix).
>
> Could you please elaborate? I really would appreciate the help, thanks.
>
>
It's hard to tell what you're doing wrong since you're using a non standard api.
It does appear though that you're assuming that if one thread orders its
memory accesses with memory ordering mechanisms, that any other thread will
see those memory accesses in the same order without having to use any memory
ordering mechanisms itself.  This is not true on most memory models.

So if one threads has to do A and B to memory in that order, then it
needs a memory barrier between A and B.  And if another thread has
to see B and then A in that order, then it needs a memory barrier
between B and A.

And we haven't gotten into the issues of atomicity, word tearing, forward
progress, etc... yet.

-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.