Subj : Re: hyperthreading in database-benchmarks To : comp.arch,comp.programming.threads From : Oliver S. Date : Tue Oct 11 2005 07:33 pm > I assume you're talking about running queries which would sequentially > scan memory which isn't cache's strong point since it's LRU optimized. Of course. > And it doesn't look like the mfgrs are doubling up on cache for these > systems, so no help here. All query-workloads that fit within any cache of a current CPU-architecture aren't worth to be mentioned because they're small anyway. I'm thinking of a huge load of oltp-clients running a lot of query-threads or a small load of dwl-clients running querys on large data-volumes. > You could get help running a hardware scount thread or ganging the queries > so they act as hardware scouts for each other getting some synergism out > of the process. I think scouting is a nice idea, but if you have some threads on each core which will fill each other's stalls, the effect of scouting will become very small. > There probably won't be gang scheduling support from any of the OSes > for a while at least due to the instability of hw design at this point. I don't think scheduling is a point here because with large data-sets and appropriate read-ahead, the query-threads wouldn't give up their time-slices by doing I/O very often. > Where are the cache hits occurring? > On the index traversals or on the table data itself or on both? On both of course, but there are two different scenarios: With dwh-workloads, there are a lot of linear accesses to the indices and the blocks of the tables; so linear memory-performance becomes prevalent here and prefetching which is more simple than scouting will help. With oltp-workloads, the parts fetched from the data-blocks become more scattered and prefetching won't help a lot. BTW: I think I'll do a little benchmark with two threads doing pointer-chasing the next days. So we'll see what hyperthreading can do in this worst-case cited often by SMT-advocates. BTW2: I'm currently writing a high-performance general-purpose memory-allocator; I'm pretty sure it will outperform most current allocators on small block sizes. .