Subj : Re: STL Containers w/ Virtually Zero-Overhead Lock-Free Reads... To : (Usenet) From : Chris Thomasson Date : Sun Sep 25 2005 11:13 am >> I created and patented an algorithm that I am using to build >> high-performance shared containers. > > How does this compare to Microsoft's SLists (nonblocking singly-linked > lists), that are part of the Windows SDK? They are simply not comparable because the "SList" algorithm exists in a completely different category. BTW, did you know that this particular one has been around for ages, and is better known as the IBM FreeList: http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf Look in appendix A, under multi-processing examples. You will actually find Microsoft's algorithm in an IBM publication... ;) For what its worth, here is my old and experimental x86 implementation of the IBM FreeList: http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html The algorithm resides in the following functions ac_i686_stack_mpmc_push_cas np_ac_i686_stack_mpmc_pop_dwcas The data-structure is here: http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_h.html ac_i686_stack_mpmc_t You can use this on Linux; no windows required! ;) Please note, my code uses the mfence instruction for its SMR implementation. This instruction doesn't work on every x86. You might need to change it to an (lock xadd) instruction that operates on a dummy location. >> I have only used the invention for shared containers so far... > > Which containers have you implemented? Mainly just singly and doubly-linked lists, hash-maps, arrays and trees. I am also implementing smart-pointers, eventcounts, and memory allocators... > Do you have performance data? I have a whole lot of performance data. I am getting ready to present it. It should be ready before 2006. There will also be some portable and adjustable demo applications that show off some rather amazing performance benefits. The demos will show how reference counting and linked-lists can be efficiently implemented for multi-processor systems. For instance... There is a demo that creates a group of reader threads and a couple of writer threads. Each reader acquires a reference to a shared object, reads some data from the object, randomly yields, releases the reference and repeats until the object is null. The writer threads, every so often, swaps the shared object with a new one, and then destroys the old copy. The writer threads swap in a null pointer to end the test after it has performed all of its writes. I measure the average number of reads from the shared object a reader thread performed per. second during the duration of the test. Here are some quick numbers for a very common processor: P4 HyperThreaded 3.06ghz ( I have also tested SMP ) Boost Mutex 10 Readers - 2 Writers ( 500 updates ) ------------------------- 22,254 avg. reads per-second per-reader. Generic Proxy Based Collector 10 Readers - 2 Writers ( 500 updates ) ------------------------- 583,045 avg. reads per-second per-reader. SMR Hazard Pointers 10 Readers - 2 Writers ( 500 updates ) ------------------------- 706,881 avg. reads per-second per-reader. VZOOM 10 Readers - 2 Writers ( 500 updates ) ------------------------- 1,235,159 avg. reads per-second per-reader. The reason VZOOM allows the reader threads to achieve so many reads per-second is because, unlike the others, it doesn't use any loops, atomic operations, and/or memory barriers to acquire/release a reference to the shared object. When the demo is published, you will be able to "test and tweak" on various systems to see some of the numbers for yourself... ;) [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] .