Subj : Re: Using hierarchical memory as an acquire memory barrier for dependent To : comp.programming.threads,comp.arch From : David Hopwood Date : Mon Sep 12 2005 12:11 am Joe Seigh wrote: > While perusing patent applications for lock-free techniques, I noticed > the one by Paul McKenney titled "Software implementation of synchronous > memory Barriers" (20020194436) which basically describes an RCU like > technique for eliminating the need for acquire memory barriers by using > signals to force them on a as needed basis. That explains that part of > the discussion on RCU+SMR when it occurred on LKML. > > I seems to me you could use the memory hierarchy to accomplish the same > effect. I'm assuming that out of order dependent memory accesses won't > cross memory hierarchy boundaries. > > There's two levels of the memory hierarchy we could exploit and possibly > without any special kernel support. One is cache and the other is virtual > memory. Basically, when memory is freed with proper release semantics, either > explicit or supplied by RCU, you use another form of RCU to determine when > all processors have flushed or purged references to that memory from their > cache or from their translation lookaside buffer. The purge/flush > instructions are the quiescent states for the second form of RCU. Once all > processors has quiesced (dropped references) it is safe for writer threads > to use that memory for updates using RCU. > > The problem is that as soon as the writer thread starts accessing that > memory, you can start polluting cache and/or the translation lookaside buffer > with no assurance that they will be flushed if a reader thread is dispatched on > that processor before the writer thread completes its updates to memory. The > way around that problem is to use two different memory mappings for the > same memory, one for use by writer threads and one for use by reader > threads. > This trick depends on cache working on virtual memory not real memory if > you're using cache as the memory hierarchy level. Madness. Reading all that patent obfuscated English has addled your brain. Cache is supposed to be a transparent abstraction. So is the TLB (software TLBs notwithstanding). Breaking that will break anyone's ability to understand what the system is doing, just in order to try (without necessarily succeeding) to optimize something that isn't a performance bottleneck. -- David Hopwood .