Subj : Using hierarchical memory as an acquire memory barrier for dependent
To   : comp.programming.threads,comp.arch
From : Joe Seigh
Date : Sun Sep 11 2005 01:29 pm

While perusing patent applications for lock-free techniques, I noticed the one
by Paul McKenney titled "Software implementation of synchronous memory Barriers"
(20020194436) which basically describes an RCU like technique for eliminating the
need for acquire memory barriers by using signals to force them on a as needed
basis.  That explains that part of the discussion on RCU+SMR when it occurred on
LKML.

I seems to me you could use the memory hierarchy to accomplish the same
effect.  I'm assuming that out of order dependent memory accesses won't cross
memory hierarchy boundaries.

There's two levels of the memory hierarchy we could exploit and possibly without
any special kernel support.   One is cache and the other is virtual memory.
Basically, when memory is freed with proper release semantics, either explicit
or supplied by RCU, you use another form of RCU to determine when all
processors have flushed or purged references to that memory from their
cache or from their translation lookaside buffer.   The purge/flush instructions
are the quiescent states for the second form of RCU.  Once all processors
has quiesced (dropped references) it is safe for writer threads to use that
memory for updates using RCU.

The problem is that as soon as the writer thread starts accessing that memory,
you can start polluting cache and/or the translation lookaside buffer with no
assurance that they will be flushed if a reader thread is dispatched on that
processor before the writer thread completes its updates to memory.  The
way around that problem is to use two different memory mappings for the
same memory, one for use by writer threads and one for use by reader threads.
This trick depends on cache working on virtual memory not real memory if
you're using cache as the memory hierarchy level.  Also addresses stored
in memory should valid in the reader memory mapping if you don't want
readers doing offset/address to address conversion.

Granularity of memory management would be the cache line size or page size
depending on which mechanism you  are using.

On one level, this is similar to what virtual memory managers do when managing
real memory.  In fact one of the earliest uses of an RCU like technique was by
MVS to determine when it was safe to reallocate real memory to another
mapping with SIGP signaling to speed things up if it needed to steal pages.

-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.