Subj : Re: STL Containers w/ Virtually Zero-Overhead Lock-Free Reads...
To   : (Usenet)
From : Chris Thomasson
Date : Sun Sep 25 2005 11:08 am

>> a well built generic lock-free read container doesn’t
>> need to rely on internally defined forms of mutual exclusion in order to
>> provide "read-only access" that doesn't use any nasty atomic operations
>> and/or StoreLoad style memory barriers...
>
> Of course.  But the effort you put into making your vector lock-free is
> not needed to make the above work, so in effect it goes to waste.

Correct.


> , so in effect it goes to waste.

My solution, wrt STL containers, doesn’t involve lock-free or lock-based
writes. It’s strictly a solution for readers.





> And even lock free primitives like CAS are significantly more expensive
> than their "plain old" counterparts, so making the vector lock free
> actually imposes an efficiency cost for no benefit in this case.

Humm... I wonder what “plain old” counterparts you might be referring to? I
totally agree that atomic operations are very expensive for modern
processors to execute. This is one of the main reasons why I am working on
completely eliminating them from read-only iterations.




> Anyway "read-only access" usually works just fine even with a
> completely thread-unaware container

Humm... This is not really true in a multi-threaded environment...


What happens if a writer thread pops a node and frees it while there are a
couple of reader’s threads concurrently performing iterations? How can we
efficiently and portably ensure that the readers don’t segfault? I will
refer to this specific situation as the “reader/writer problem”.


Now, you might be thinking that shared_ptr could be a viable solution to the
problem. Unfortunately, it’s simply not compatible; believe it or not,
shared_ptr is not a “true” atomic reference count:


http://groups.google.com/group/comp.lang.c++.moderated/msg/0c6f5c5ab72164d0?hl=en

http://groups.google.com/group/comp.programming.threads/msg/a60ce03f406dfcb5

http://groups.google.com/group/comp.programming.threads/msg/93983d56839ffb45


Even if shared_ptr could handle this type of reference counting, it still
relies on expensive atomic operations and StoreLoad style memory barriers
for virtually every reference modification. This produces a high amount of
unneeded overhead.


I guess one could use full-blown garbage collection. IMHO, that would be
overkill and, unfortunately, garbage collectors still produce unneeded
overhead in the form of stop-the-world techniques, memory barriers, ect...




Luckily, all is not lost! :)

There are some existing solutions to the reader/writer problem:

IBM: Read, Copy and Update (RCU)
IBM: Safe Memory Reclamation (SMR)
SUN: Repeat Offender Problem (ROP)

All of these work; however, they are not practical for any sort of “generic
solution”. Here are just a couple of reasons why:

http://groups.google.com/group/comp.programming.threads/msg/b28bdb0051e2c2c6




> , so I'm not sure what you're driving at here.

I am very interested in incorporating a flexible solution to the
reader/writer problem into STL container logic. This would give them the
inherent ability to perform “extremely well” on various modern
multi-processor systems.




>>> Suppose we had lock-free vectors.  Now one thread calls
>>> sort(v.begin(), v.end()) while another calls v.push_back(5).
>>
>> These would be considered writes.
>
> Yes.
>
>>> What does either thread see?
>>
>> It depends on which thread executes first; remember writes are mutually
>> excluded.
>
> The individual writes to the vector are mutually excluded (when you
> have control over them -- which you don't, since the sort algorithm
> gets random access iterators and can thus grab addresses and
> references into the vector), but the call to sort consists of many
> individual writes to the vector.  Even if you _could_ mutually exclude
> each one (which you can't, for reasons I just described) there's no
> reason to think that push_back doesn't sneak between two of them.

The calls to sort and push_back could be protected by an external mutex. The
individual writes would fall under the same critical-section; therefore they
would be atomic. Please note, I am not advocating lock-free writes in any
way, shape or form. You are 100% correct in your assertion that lock-free
writes are generally useless wrt STL Containers; however, this is not true
for lock-free reads...




>>> It seems to me that even if it's possible to
>>> come up with coherent semantics for this case (which in itself is
>>> debatable), it's impossible to come up with _useful_ semantics.
>>> If containers are to be shared across threads, their state will almost
>>> always be just a small piece of some larger invariant, in which case
>>> you'll need to synchronize or "lock-free-ize" at a different level.
>>
>> This is completely compatible with virtually zero-overhead lock-free
>> reads.
>
> Yes, but it renders wasteful any special measures taken or cycles
> spent to do lock free reads.

How so? In the realm of lock-free reads, special measures simply have to be
taken if you don’t want your collection to basically be a massive
race-condition. Writers need to be specially coordinated with the readers in
order to solve the nasty reader/writer problem...




>> Many existing application defined algorithms that decided to use STL
>> containers in multithreaded environments use mutexs to protect everything
>> anyway; therefore they are already compatible with the requirements of
>> the
>> highly flexible lock-free algorithm I am using. Application defined
>> algorithm can literately pick and choose exactly which if their
>> "read-only
>> operations" can be executed without a lock. For instance, a thread that
>> makes frequent read-only const_iteration's over a shared collection would
>> be
>> a great candidate to "lock-free-ize"...
>
> I thought you were "lock-free-izing" the container.

Not the entire container. I just think that they should give an application
programmer the ability to “lock-free-ize” read-only iterations. That's all.




> If so, you have
> done nothing to protect the entire process of iterating over it from
> being disturbed by writes.

This is exactly where a reader/writer solution would come into play. They
can directly address this type of situation.




> If not, maybe you should revise your description of what you're proposing.

I agree. I have a feeling that I am not sufficiently and clearly stating my
case. For that, I am sorry. Let me attempt to clear some things up:


1. I am proposing incorporating a solution to the reader/writer problem into
the STL container logic.

2. I am not advocating lock-free writes in any way, shape, or form.


I think that STL containers should provide the ability for lock-free
read-only access to application programmers. This would allow STL containers
to efficiently address multi-processor systems. I believe this ability would
further help C++ enter the world of high-performance multi-threading...


Any thoughts?




BTW:

Thank you, David, for your input, patience and more importantly, your time.

I really do appreciate it.

:)



      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

.