Subj : Hazard Pointers w/o memory barrier. To : comp.programming.threads From : Joe Seigh Date : Tue Apr 19 2005 09:32 am You can get rid of the expensive memory barrier when you set hazard pointers by combining it with a version of RCU that uses context switches (voluntary and involuntary), signals, syscalls or anything that causes processor serialization as quiescent states. You delay checking the hazard pointers until you have an RCU checkpoint (i.e. all threads have quiesced at least once). If you're using a polling for SMR/hazard pointers this won't cause any appreciable extra delay. The processor serialization from the RCU quiesce points ensure that the hazard pointer operations are seen in correct order without requiring a memory barrier. However, you still need to ensure the compiler correctly orders the instructions. This will probably require inline assembler. On pipelined architectures setting a hazard pointer is pretty close to a raw pointer load, about 2X on powerpc. On Solaris and possibly Aix, you can use /proc to get context switching counts for threads to use as quiesce points. For other unices and Linux, you will need to periodically signal each thread to execute a signal handler that increments a signal count to use as quiesce points. For windows, there's no asynchronous signal facility or /proc, but there's a hack that will work. I won't disclose it since I don't support windows and I want to see how long it takes someone else to figure it out. If you can determine thread state, i.e. running or not running, you can use the not running as a quiesce state. This takes care of some forward progress problems RCU has with suspended threads not quiescing periodically if they've been indefinitely suspended. The advantage of this over pure RCU is that it allows you to hold local references for indefinite periods of time. With RCU you have to keep the references short since RCU is a proxy GC and a thread not quiescing holds up GC for everything. Comparison with deferred reference counting. Deferred reference counting has the problem that in C/C++ it has no control over how local references are expressed and used, unlike Java VM's. Although not likely, a pointer doesn't have to like like a pointer to something, it could be an offset or a hash of something. More problematic is a pointer put into a locally owned heap object. The thread stack only scan would never catch this since the reference isn't on the stack. And although a stack GC scan is cheaper than scanning the heap plus stacks, it's still expensive. Hazard pointers get around this problem by explicity and formally declaring local references/usage. There's an alternative method of pointer usage declation where you'd simply store the pointer into the hazard pointer w/o the reload and recheck, and at scan time check the stack frame register save area as well as the hazard pointers. This is cheaper than scanning the whole stack but as the hazard pointer reload and recheck are very cheap on a pipelined cpu to begin with, I don't think it's worth it. On Z architecture you could use a MVC instruction to store the global pointer into the hazard pointer and then load from the hazard pointer rather than the usual hazard pointer logic, since the MVC instruction isn't interruptable. I don't know if it's any faster but you could do it. I think that's it. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .