Subj : Re: JOB: Sandbridge Technologies, White Plains, NY
To   : comp.programming.threads
From : Mayan Moudgill
Date : Fri Jul 15 2005 02:43 pm

Joe Seigh wrote:

> Mayan Moudgill wrote:
> 
>> Joe Seigh wrote:
>>
>>> Mayan Moudgill wrote:
>>
>>
>>
>>> Realtime and good thread performance are somewhat antithetical.  FIFO
>>> locks perform less well than normal locks.  Lock-free is a better way
>>> to go.
>>
>>
>>
>> We've already got some lock-free algorithms in place. The ISA uses 
>> load-word-reserved/store-word-conditional as our "synchronization" 
>> primitives, so we can implement stuff like atomic increment, lock-free 
>> LIFOs etc.
> 
> 
> There are other lock-free techniques as well such as RCU, SMR hazard
> pointers, and atomically thread-safe reference counting.  So some
> of the lock-free algorithms that you could only use with Java can
> be used in a non Java environment as well.

I'm sure there exist techniques that might prove useful, but that we 
have not implemented (or even considered). Thats why we're looking to 
hire someone in this area.

Our programming is done in C. However, since we wrote our compiler, we 
have a fair amount of control over the dialect we recognize and the 
kinds of things we output.
- the C compiler is slightly pthread aware, and its planned to do more 
cross-thread semantic analysis
- the C compiler generates multi-threaded code.



>>
>>   Plus most of the conventional solutions I've see for priority
>>
>>> inversion are really suboptimal.
>>
>>
>>   Basically you get all your RT guarantees
>>
>>> by crippling everything so lower priority threads can't perform better
>>> than high priority threads.
>>
>>
>>
>> Our biggest issue is the difference between "pinned" and "unpinned" 
>> threads. A pinned thread is one which is locked down to a hardware 
>> thread, and will not get switched. For perfomance, you want to use 
>> spin-based synchronization mechanisms as opposed to 
>> dequeing/context-switching when a pinned thread needs to wait.
> 
> 
> With locks for both kinds or just the pinned threads?

The problem along the lines of: if you're trying to acquire a lock 
(pthread_mutex, or something along those lines), and the resource is 
locked, then, if you're pinned you should spin, else you might consider 
doing a context-switch.

Moreover, if you're communicating between two pinned threads, (e.g., a 
producer consumer), then the lowest overhead synchronization mechanism 
is to simply use a pair of flag variables,

There end up being a whole series of trade-offs in the actual 
implementation of synch primitives. There may be a series of apis required:
- a default POSIX compliant jack-of-all-trades version
- an optimized version for handling pinned threads
- code-sequences generated inline by the compiler based on analysis

> There's also some pretty interesting things you can do with ganged 
> scheduling
> on the same core like producer/consumer without scheduler intervention to
> keep the threads in "sync".

Hadn't really considered gang-scheduling. With 32 hardware threads, and 
  doing (hard) real-time, we have found things fall into two categories:
- hard real-time: pin the threads
- non-real-time: context-switched, pre-emptable, priority based vanilla 
POSIX threads.

Gang-scheduling would be an interesting idea to implement if we needed 
something in the middle.

> 
> I'm referring to pthread_cond_signal stuff.  Linux preempts the signaler
> which can slow things down considerably.

I don't think we pre-empt the signaller.

Even if we do, context-switch costs are much lower on our processor than 
on a traditional processor for a variety of reasons (one of which is 
that we don't have virtual memory)

.