Subj : Re: recursive mutexes
To   : comp.programming.threads
From : Uenal Mutlu
Date : Thu May 19 2005 07:56 pm

"Markus Elfring" wrote
> > Sounds interessting, but I would be convinced only after testing
> > their performance. And here I doubt it can be faster, because
> > I myself had experimented with such structures too and had read papers
> > on this, but unfortunately the performance was very poor due to the
> > additional code checks one has to make. It sums up and degrades
> > the performance.
>
> Would you like to publish your test cases where you got the bad experiences from?

I did only some basic study. After I saw it's complexity and its limitations
I came back to normal lock methods because I was after a generally usable
fast method for mutex operations.

> > This assumption is by the fact that you need to put more code to check.
> > That is: more code must be executed;  even just two or three if statements
> > can mean too much compared to a classical mutex method using atomic counter.
>
> Would you like to perform a more detailed analysis to show concrete numbers for the
> effects on factors like "code size", "execution speed", "memory consumption",
> "concurrency/parallelization" and "througput"?

Sorry, I've not done that deep testing. Using an atomic counter you just use
4 bytes usually, no mem alloc etc. since this would be an overkill for the performance.
I just measure the elapsed cpu clock ticks.

> How do you think about to compare your approach with the available non-blocking
> synchronization implementations?

I've yet to see one generally usable. But I understand that such a generally usable
lock-free method cannot exist.

> By the way, the optimization technique "loop unrolling" can produce "more code" with
> improved runtime behaviour under specific conditions.
> Can you measure each statement sequence or function call with precise processor cycles and
> cache latencies to get an estimation for the time ranges?

My measurements are not that sophisticated, I simply measure the net effect
by timing the elapsed clock ticks after doing some million iterations in a loop.
I have also overlooked same papers on this to see its complexity and
weighted their limitations and their advantages. In the end I came to the conclusion
that it's not suitable for general use, too complicated, too costly in terms of execution
and too limited in their use. I concluded that it wasn't worth to invest more time on this.

What's your experience?

.