Subj : Re: C++ desired features To : (Usenet) From : Joe Seigh Date : Fri Feb 25 2005 09:34 pm On 24 Feb 2005 20:10:45 -0500, Hans Boehm wrote: > Joe Seigh wrote: >> Multithreading usage patterns tend to fall into producer/consumer or > reader/writer >> patterns. Message passing works well with the former but doesn't > work well >> with the latter. Mutexes don't scale well with the latter by > definition. >> rwlocks don't scale so well either. Lock-free stuff seems to do > quite well. >> You can crank up the contention level to 11 which will kill mutex and > rwlock >> based solutions and the lock-free stuff doesn't even seem to breath > heavy. >> > In my experience, lock-free solutions are great, especially when > someone else has already dealt with the tricky correctness issues. In > particular, they can safely be used from signal/interrupt handlers, > etc. > > But if you look simply at performance, in my experience at least, they > can win or lose, depending on implementation details, hardware > characteristics, etc. For example, I reported a small experiment in > http://portal.acm.org/citation.cfm?doid=1011767.1011774 (or > http://www.hpl.hp.com/techreports/2004/HPL-2004-105.html). Beside the > actual content of the paper, the last section compares a simple > lock-free stack vs. a solution based on simple spin-sleep locks. The > winners are different for a 2xPII/266 and a 4xPPro machine. And that's > pretty similar hardware. (There were also OS differences, but I'm not > sure they matter for this particular comparison. If so, I don't > understand the effect.) Even though I did an eventcount implementation in Linux which is designed to be used with lock-free algorithms, I haven't really messed around with it much. But it seems that unless you're going to rarely block you might as well use mutexes w/ condvars. I could only do 10% better than a fast condvar implementation when blocking was occurring. > > As many people have pointed out, the real problem with contention is > typically that a critical cache line (either holding the lock, or the > lock-free data structure) is transferred back and forth between > processor caches for every access. That's inherently not cheap. > (Inappropriate lock implementations may of course make it much worse, > as you can also see in the above test.) > The stuff I was experimenting with was reader/writer with the writers using mutexes and the readers being lock-free. The lock-free algorithms were an RCU for preemptive user threads implemetation, and a proxy GC based on atomic_ptr, atomic lock-free reference counted smart pointer. The testcase has reader threads traversing a linked list while writer threads are modifiying the list. The testcase exits after the writer threads do a fixed number of modifications, so it's the read and write rates, not the testcase run time, that are significant. There's a -t option that prints out the running counts and rates so that if you crank up the contention you can verify that the mutex and rwlock cases aren't going anywhere. It's at http://atomic-ptr-plus.sourceforge.net/ The project is on hiatus for a while, though I might port atomic_ptr to powerpc using a different algorithm (no cmpxchg8b, just lwarx/stwcx). -- Joe Seigh [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] .