Subj : Re: STL Containers w/ Virtually Zero-Overhead Lock-Free Reads...
To   : (Usenet)
From : Chris Thomasson
Date : Sun Sep 25 2005 11:13 am

>> I created and patented an algorithm that I am using to build
>> high-performance shared containers.
>
> How does this compare to Microsoft's SLists (nonblocking singly-linked
> lists), that are part of the Windows SDK?

They are simply not comparable because the "SList" algorithm exists in a
completely different category. BTW, did you know that this particular one
has been around for ages, and is better known as the IBM FreeList:


http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf


Look in appendix A, under multi-processing examples. You will actually find
Microsoft's algorithm in an IBM publication...

;)




For what its worth, here is my old and experimental x86 implementation of
the IBM FreeList:

http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html


The algorithm resides in the following functions

ac_i686_stack_mpmc_push_cas
np_ac_i686_stack_mpmc_pop_dwcas


The data-structure is here:

http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_h.html

ac_i686_stack_mpmc_t




You can use this on Linux; no windows required!

;)


Please note, my code uses the mfence instruction for its SMR implementation.
This instruction doesn't work on every x86. You might need to change it to
an (lock xadd) instruction that operates on a dummy location.




>> I have only used the invention for shared containers so far...
>
> Which containers have you implemented?

Mainly just singly and doubly-linked lists, hash-maps, arrays and trees. I
am also implementing smart-pointers, eventcounts, and memory allocators...




> Do you have performance data?

I have a whole lot of performance data. I am getting ready to present it. It
should be ready before 2006. There will also be some portable and adjustable
demo applications that show off some rather amazing performance benefits.
The demos will show how reference counting and linked-lists can be
efficiently implemented for multi-processor systems.


For instance... There is a demo that creates a group of reader threads and a
couple of writer threads. Each reader acquires a reference to a shared
object, reads some data from the object, randomly yields, releases the
reference and repeats until the object is null. The writer threads, every so
often, swaps the shared object with a new one, and then destroys the old
copy. The writer threads swap in a null pointer to end the test after it has
performed all of its writes. I measure the average number of reads from the
shared object a reader thread performed per. second during the duration of
the test.




Here are some quick numbers for a very common processor:


P4 HyperThreaded 3.06ghz
( I have also tested SMP )




Boost Mutex
10 Readers - 2 Writers ( 500 updates )
-------------------------
22,254 avg. reads per-second per-reader.




Generic Proxy Based Collector
10 Readers - 2 Writers ( 500 updates )
-------------------------
583,045 avg. reads per-second per-reader.




SMR Hazard Pointers
10 Readers - 2 Writers ( 500 updates )
-------------------------
706,881 avg. reads per-second per-reader.




VZOOM
10 Readers - 2 Writers ( 500 updates )
-------------------------
1,235,159 avg. reads per-second per-reader.




The reason VZOOM allows the reader threads to achieve so many reads
per-second is because, unlike the others, it doesn't use any loops, atomic
operations, and/or memory barriers to acquire/release a reference to the
shared object.


When the demo is published, you will be able to "test and tweak" on various
systems to see some of the numbers for yourself...

;)



      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

.