Subj : Re: Optimization to Jeffrey Richter's COptex (Windows)?
To   : comp.programming.threads
From : Luke Elliott
Date : Wed Jan 26 2005 03:33 pm

Joseph Seigh wrote:
> On Wed, 26 Jan 2005 05:41:38 GMT, Luke Elliott <lukeelliott@hotmail.com> 
> wrote:
> 
>> Hi
>>
>> I assume most people are familiar with Richter's COptex but if not
>> here's some links:
>>
>> http://www.microsoft.com/msj/0198/win320198.aspx
>> http://www.microsoft.com/msj/0198/win32textfigs.htm#fig2
>>
>> I've been trying to work out why he's used InterlockedExchange() when
>> setting the owning thread id. As far as I call tell it isn't required
>> with a uni-processor. (I haven't fully convinced myself the interlocked
>> is required on SMP but I think it probably is.)
>>
>> I've tried removing the interlocked exchange of thread id (replaced with
>> normal assignment) and unsurprisingly get a considerable performance
>> boost. This was on a uni-processor machine running a stress testing app.
>>
>> Anybody have any views? I'll happily explain a bit more why I think it
>> isn't necessary on UP but I guess I'm hoping for similar convincing
>> views so I don't have to try and put it into words...
>>
> I'm not sure what Richter is using the thread id for since checking for
> it is a debug option.  Most of the stuff out of Redmond is rather strange
> and odd.  I wouldn't look to it for good examples of doing anything.
> 
> Most conventional solutions require the lock owner id to be atomic.  
> Setting
> it and unsetting it is done while holding the lock.  Unlocking code checks
> the owner id without a lock since the only way it could match the 
> current thread
> it is if it owned the lock.
> 

Yeah that's what I was thinking (for single processor).

What I was thinking could happen in the SMP case was something like:

1. Thread id 1234 runs on processor 1
*  Acquires lock
*  Releases lock

2. Thread id 5678 runs on processor 2
*  Acquires lock

3. Thread id 1234 runs on processor 3
*  Could value of thread id in COptex could still be 1234 from [1], 
confusing the recurse count? This thread thinks it owns the lock but 
doesn't.

Except that (presumably) can't happen because the interlocked ops from 
acquire in [2] and in the first step of [3] will cause the processor 
caches to be coherent.

.