Subj : Re: Optimization to Jeffrey Richter's COptex (Windows)? To : comp.programming.threads From : Luke Elliott Date : Wed Jan 26 2005 03:33 pm Joseph Seigh wrote: > On Wed, 26 Jan 2005 05:41:38 GMT, Luke Elliott > wrote: > >> Hi >> >> I assume most people are familiar with Richter's COptex but if not >> here's some links: >> >> http://www.microsoft.com/msj/0198/win320198.aspx >> http://www.microsoft.com/msj/0198/win32textfigs.htm#fig2 >> >> I've been trying to work out why he's used InterlockedExchange() when >> setting the owning thread id. As far as I call tell it isn't required >> with a uni-processor. (I haven't fully convinced myself the interlocked >> is required on SMP but I think it probably is.) >> >> I've tried removing the interlocked exchange of thread id (replaced with >> normal assignment) and unsurprisingly get a considerable performance >> boost. This was on a uni-processor machine running a stress testing app. >> >> Anybody have any views? I'll happily explain a bit more why I think it >> isn't necessary on UP but I guess I'm hoping for similar convincing >> views so I don't have to try and put it into words... >> > I'm not sure what Richter is using the thread id for since checking for > it is a debug option. Most of the stuff out of Redmond is rather strange > and odd. I wouldn't look to it for good examples of doing anything. > > Most conventional solutions require the lock owner id to be atomic. > Setting > it and unsetting it is done while holding the lock. Unlocking code checks > the owner id without a lock since the only way it could match the > current thread > it is if it owned the lock. > Yeah that's what I was thinking (for single processor). What I was thinking could happen in the SMP case was something like: 1. Thread id 1234 runs on processor 1 * Acquires lock * Releases lock 2. Thread id 5678 runs on processor 2 * Acquires lock 3. Thread id 1234 runs on processor 3 * Could value of thread id in COptex could still be 1234 from [1], confusing the recurse count? This thread thinks it owns the lock but doesn't. Except that (presumably) can't happen because the interlocked ops from acquire in [2] and in the first step of [3] will cause the processor caches to be coherent. .