cb1 Subj : Re: JOB: Sandbridge Technologies, White Plains, NY To : comp.programming.threads From : Joe Seigh Date : Fri Jul 15 2005 11:51 am Mayan Moudgill wrote: > Joe Seigh wrote: > >> Mayan Moudgill wrote: > > >> Realtime and good thread performance are somewhat antithetical. FIFO >> locks perform less well than normal locks. Lock-free is a better way >> to go. > > > We've already got some lock-free algorithms in place. The ISA uses > load-word-reserved/store-word-conditional as our "synchronization" > primitives, so we can implement stuff like atomic increment, lock-free > LIFOs etc. There are other lock-free techniques as well such as RCU, SMR hazard pointers, and atomically thread-safe reference counting. So some of the lock-free algorithms that you could only use with Java can be used in a non Java environment as well. > > Plus most of the conventional solutions I've see for priority > >> inversion are really suboptimal. > > Basically you get all your RT guarantees > >> by crippling everything so lower priority threads can't perform better >> than high priority threads. > > > Our biggest issue is the difference between "pinned" and "unpinned" > threads. A pinned thread is one which is locked down to a hardware > thread, and will not get switched. For perfomance, you want to use > spin-based synchronization mechanisms as opposed to > dequeing/context-switching when a pinned thread needs to wait. With locks for both kinds or just the pinned threads? There's also some pretty interesting things you can do with ganged scheduling on the same core like producer/consumer without scheduler intervention to keep the threads in "sync". With a conventional solution you need to use condition varialbles to keep the threads in sync. With hardware threads, you use (in Intel hyperthreading) the PAUSE instruction to allocate more cpu resources to the thread that's falling behind without having to invoke the scheduler with all the overhead that entails. > > >> And try to avoid preemptive signaling like Linux NPTL does. Unless >> Posix mandates >> it as a scheduling point. > > > kill and signalling have all kinds of ugliness associated with it. We > don't have a notion of a process (though we were thinking of > thread-groups) so we avoid the real ugliness, but even pthread_kill is > pretty ugly. I'm referring to pthread_cond_signal stuff. Linux preempts the signaler which can slow things down considerably. I've seen up to a 3x performance boost throwing in an extra sched_yield to "undo" the preemption on Linux. This is with a simple producer/consumer file copy using two threads which is amazing considering file i/o is somewhat asynchronous due to filesystem buffering. Look at the fastcv package at http://atomic-ptr-plus.sourceforge.net/ The lock-free part of it gets you about a 10% improvement in performance on a single processor. The sched_yield part will probably break when they fix the futex race condition since the return code from futex signal won't be meaningful anymore. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. . 0