Subj : Re: hyperthreading in database-benchmarks
To   : comp.arch,comp.programming.threads
From : Oliver S.
Date : Tue Oct 11 2005 07:33 pm

> I assume you're talking about running queries which would sequentially
> scan memory which isn't cache's strong point since it's LRU optimized.

Of course.

> And it doesn't look like the mfgrs are doubling up on cache for these 
> systems, so no help here.

All query-workloads that fit within any cache of a current CPU-architecture
aren't worth to be mentioned because they're small anyway. I'm thinking of
a huge load of oltp-clients running a lot of query-threads or a small load
of dwl-clients running querys on large data-volumes.

> You could get help running a hardware scount thread or ganging the queries
> so they act as hardware scouts for each other getting some synergism out
> of the process.

I think scouting is a nice idea, but if you have some threads on each core
which will fill each other's stalls, the effect of scouting will become very
small.

> There probably won't be gang scheduling support from any of the OSes
> for a while at least due to the instability of hw design at this point.

I don't think scheduling is a point here because with large data-sets and
appropriate read-ahead, the query-threads wouldn't give up their time-slices
by doing I/O very often.

> Where are the cache hits occurring?
> On the index traversals or on the table data itself or on both?

On both of course, but there are two different scenarios: With dwh-workloads,
there are a lot of linear accesses to the indices and the blocks of the tables;
so linear memory-performance becomes prevalent here and prefetching which is
more simple than scouting will help. With oltp-workloads, the parts fetched
from the data-blocks become more scattered and prefetching won't help a lot.


BTW:  I think I'll do a little benchmark with two threads doing pointer-chasing
      the next days. So we'll see what hyperthreading can do in this worst-case
      cited often by SMT-advocates.
BTW2: I'm currently writing a high-performance general-purpose memory-allocator;
      I'm pretty sure it will outperform most current allocators on small block
      sizes.

.