cb1
Subj : Re: JOB: Sandbridge Technologies, White Plains, NY
To   : comp.programming.threads
From : Joe Seigh
Date : Fri Jul 15 2005 11:51 am

Mayan Moudgill wrote:
> Joe Seigh wrote:
> 
>> Mayan Moudgill wrote:
> 
> 
>> Realtime and good thread performance are somewhat antithetical.  FIFO
>> locks perform less well than normal locks.  Lock-free is a better way
>> to go.
> 
> 
> We've already got some lock-free algorithms in place. The ISA uses 
> load-word-reserved/store-word-conditional as our "synchronization" 
> primitives, so we can implement stuff like atomic increment, lock-free 
> LIFOs etc.

There are other lock-free techniques as well such as RCU, SMR hazard
pointers, and atomically thread-safe reference counting.  So some
of the lock-free algorithms that you could only use with Java can
be used in a non Java environment as well.

> 
>   Plus most of the conventional solutions I've see for priority
> 
>> inversion are really suboptimal.
> 
>   Basically you get all your RT guarantees
> 
>> by crippling everything so lower priority threads can't perform better
>> than high priority threads.
> 
> 
> Our biggest issue is the difference between "pinned" and "unpinned" 
> threads. A pinned thread is one which is locked down to a hardware 
> thread, and will not get switched. For perfomance, you want to use 
> spin-based synchronization mechanisms as opposed to 
> dequeing/context-switching when a pinned thread needs to wait.

With locks for both kinds or just the pinned threads?

There's also some pretty interesting things you can do with ganged scheduling
on the same core like producer/consumer without scheduler intervention to
keep the threads in "sync".  With a conventional solution you need to
use condition varialbles to keep the threads in sync.  With hardware threads,
you use (in Intel hyperthreading) the PAUSE instruction to allocate more
cpu resources to the thread that's falling behind without having to invoke
the scheduler with all the overhead that entails.
> 
> 
>> And try to avoid preemptive signaling like Linux NPTL does.  Unless 
>> Posix mandates
>> it as a scheduling point. 
> 
> 
> kill and signalling have all kinds of ugliness associated with it. We 
> don't have a notion of a process (though we were thinking of 
> thread-groups) so we avoid the real ugliness, but even pthread_kill is 
> pretty ugly.

I'm referring to pthread_cond_signal stuff.  Linux preempts the signaler
which can slow things down considerably.  I've seen up to a 3x performance
boost throwing in an extra sched_yield to "undo" the preemption on Linux.
This is with a simple producer/consumer file copy using two threads which
is amazing considering file i/o is somewhat asynchronous due to filesystem
buffering.  Look at the fastcv package at
http://atomic-ptr-plus.sourceforge.net/
The lock-free part of it gets you about a 10% improvement in performance
on a single processor.  The sched_yield part will probably break when they
fix the futex race condition since the return code from futex signal won't
be meaningful anymore.


-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.

0