Subj : Re: Hazard Pointers w/o memory barrier.
To   : comp.programming.threads
From : Chris Thomasson
Date : Tue May 03 2005 11:25 pm

>> Yes it can. I have already been experimenting with it for a couple of 
>> days
>> and am extremely pleased with the initial results. The thing that makes 
>> me a
>> bit uneasy with my windows ( and some other os's ) implementation is the
>> dependence on undocumented functions. All of my code that uses this form 
>> of
>> garbage collection could suddenly stop working when the next service pack
>> gets installed...
>>
>
> You could always have a fall back plan of putting the membar back in
> that case.  It would run slower but that's Microsoft's problem
> especially since it would be their platform running worse than their
> competitor's platforms.

Yeah. So far, windows makes it relatively easy to detect quiescent periods.




>>> slightly crippled version of RCU to avoid imperial
>>> entanglement.
>>
>> We aren't the programmers there looking for...  ;)
>>
>> Seriously though, this is why I don't provide the public with my 
>> user-space
>> rcu ( URCU ) implementation. Did you inform McKenney that you were 
>> posting
>> your algorithm? I am not sure if I need to ask permission...
>
> McKenney and Michael both know about it.  The problem is you have to stick
> with a implementation based on the first patent that is in the public
> domain.  The claims were more narrow than they might have been because
> we believed they had to be tied to the actual methods in the patent.

What were the claims for the patented version of RCU? Was it the algorithm 
that used two counters and three rcu-batch lists nxtlist->curlist->donelist 
per-cpu?




> Not really a problem if you're monitoring per processor/thread counters
> or clocks since that's probably how you'd have to do it but with a commom
> counter you could speed up the polling somewhat but it wouldn't strictly
> be covered by the first patent.

Seems that tracking per-cpu counters "and" keeping the batches on per-cpu 
lists would infringe on the patent. Humm... My implementation differs from 
this, but not much.




>> Yeah. Original SMR is well suited for high write throughput because of 
>> the
>> guarantees it puts on the way nodes are deleted. This new URCU_SMR thing
>> can't seem to handle frequent and persistent writes as good as the 
>> original.
>> It caused some problems in a couple of my algorithms. Its probably a good
>> idea to export two versions of SMR...
>>
> If you have a really high write rate then you would probably be better off
> with a mutex or one of the write lock-free algorithms.

Unfortunally, using a mutex slows down every test and the kernel usage goes 
off the charts. The specific algorithm that this caused problems with is the 
lock-free stack that AppCore exports. It's heavily optimized for high number 
of writes. It allocates its nodes internally from a three-part cache:

per_thread->lock-free_lifo->malloc/free


It frees the nodes into a two-part cache:

lock-free_lifo->smr_collector


So a write is a mfence->cmpxchg8b and the cache is a cmpxchg8b. No kernel 
involved. When I put the stack under high-load ( 5,000,000-10,000,000 
push->pop cycles through 32-64 threads ) the smr_collector gets used quite 
frequently. Original SMR algo has upper-bound on number of ( batch_size ) 
pending nodes which make everything work fine. URCU-SMR batch_size's "can" 
be volatile under load, and the memory usage started to grow and grow under 
stress testing. I put in some code to alter the polling rate dynamically and 
auto-adjust to the current number of pending objects, which helped out, but 
I could still overload the URCU-SMR system under artificial high-load.


Here is the actual code that does not seem to like URCU-SMR under high-load:

http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_stack_mpmc_h.html

http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_stack_mpmc_c.html

http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html
( np_ac_i686_lfgc_smr_stack_mpmc_pop_dwcas located near the top of the 
page )




> I'm not claiming
> the read lock-free stuff is good for all situations.

Yeah, I know. I just felt the need to ( nit ) point out that little caveat, 
since it can directly affect my library in an adverse may...

;)

.