Subj : Using hierarchical memory as an acquire memory barrier for dependent To : comp.programming.threads,comp.arch From : Joe Seigh Date : Sun Sep 11 2005 01:29 pm While perusing patent applications for lock-free techniques, I noticed the one by Paul McKenney titled "Software implementation of synchronous memory Barriers" (20020194436) which basically describes an RCU like technique for eliminating the need for acquire memory barriers by using signals to force them on a as needed basis. That explains that part of the discussion on RCU+SMR when it occurred on LKML. I seems to me you could use the memory hierarchy to accomplish the same effect. I'm assuming that out of order dependent memory accesses won't cross memory hierarchy boundaries. There's two levels of the memory hierarchy we could exploit and possibly without any special kernel support. One is cache and the other is virtual memory. Basically, when memory is freed with proper release semantics, either explicit or supplied by RCU, you use another form of RCU to determine when all processors have flushed or purged references to that memory from their cache or from their translation lookaside buffer. The purge/flush instructions are the quiescent states for the second form of RCU. Once all processors has quiesced (dropped references) it is safe for writer threads to use that memory for updates using RCU. The problem is that as soon as the writer thread starts accessing that memory, you can start polluting cache and/or the translation lookaside buffer with no assurance that they will be flushed if a reader thread is dispatched on that processor before the writer thread completes its updates to memory. The way around that problem is to use two different memory mappings for the same memory, one for use by writer threads and one for use by reader threads. This trick depends on cache working on virtual memory not real memory if you're using cache as the memory hierarchy level. Also addresses stored in memory should valid in the reader memory mapping if you don't want readers doing offset/address to address conversion. Granularity of memory management would be the cache line size or page size depending on which mechanism you are using. On one level, this is similar to what virtual memory managers do when managing real memory. In fact one of the earliest uses of an RCU like technique was by MVS to determine when it was safe to reallocate real memory to another mapping with SIGP signaling to speed things up if it needed to steal pages. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .