Subj : Re: Windows Suspend/ResumeThread API equivalent for Linux or POSIX? To : comp.programming.threads From : Chris Thomasson Date : Thu May 12 2005 12:48 am >> Here is a paper describing the original SMR algorithm: >> >> http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf > > I can tell you from experience with similar algorithms that these > algorithms do not work well in user-space. To get the performance boost, > you need more control over your environment and the low-level scheduling > behavior than is possible to get in that environment. (Though they may get > you real benefits in kernel space.) Actually, both SMR and RCU can work well in user-space. Here are two working full-blown examples: http://appcore.home.comcast.net/ ( User-Space SMR ) http://atomic-ptr-plus.sourceforge.net/ ( User-Space RCU. Joe Seigh has a nice lock-free read test setup. ) > I can also tell your from experience that it is often the case that > when you think you need this type of alogorithm, what you really need to > do is rearchitect or make other subtle changes. Avoiding contention in the > first place is better than algorithms that make contention less expensive. Well, these algorithms can be used to provide scaleable solutions to the reader/writer problem. In particular, the iteration of read-mostly data-structures. I can't seem to design a lock-based solution that can get equivalent read-side performance numbers for this special case. If you can present a lock-based solution for iterating read-mostly data structures that can beat these types of algorithms I will gladly post it to my site. My site is about mixing lock-free with lock-based, so a high-speed lock-based solution to the reader-writer problem would be greatly welcome. > Of course, I'm not familiar with this exact algorithm or your exact > application. So you may have the 2% case where the usual guidelines break. > Are you *sure* portable algorithms with mutexes won't get you sufficient > performance? (Or is this a research or proof of concept kind of thing?) This is pure research. I also provide a user-space demo library as a proof of concept. > I would also warn you against making design changes based upon > benchmarks on unrealistic workloads. Many lock-free algorithms, for > example, perform better than locking algorithms on unrealistic workloads. > On realistic workloads, the descheduling of a conflicting thread allows > both threads to run with no further conflicts. This is much more efficient > than cache ping-ponging as both threads operate on the same data even with > no locking cost. Contended locks can suffer from similar problems, in certain situations... Take a global read-write lock protecting a mostly-read data-structure. Under heavy read contention, forward progress can be stalled waiting for the updates to the internal lock state to be shuttled around the CPU's caches. If the size of the read-side critical section is moderate, the cost of moving lock updates from one CPU's cache to another cal rival that of the critical section itself; This can prevent readers from truly executing in parallel. > Descheduling a thread that's conflicting with another thread is *good* > and running it later when it won't conflict with anything is *great*. > Running both threads slowly with a lock-free algorithm is bad because work > that must be done slowly is done instead of work that can be done quickly. > (Of course, you don't see this on unrealistic workloads because there's > nothing else to do.) Lock-free is not good for everything; An efficient marriage between lock-free and lock-based can be better than using one or the other. Take a look at this: http://groups.google.ca/group/comp.programming.threads/msg/7e7834ca10f2613a?hl=en ( read the HP paper describing the performance differences of lock-free with lock-based, and how a mix of both worlds can help. ) .