Subj : Re: hyperthreading in database-benchmarks To : comp.arch,comp.programming.threads From : Bill Todd Date : Thu Oct 13 2005 03:31 am David Kanter wrote: > Bill Todd wrote: > >>Oliver S. wrote: >> >>>Has anyone found information on how much hyperthreading is able to >>>improve the >>>performance of database-workloads (OTP as well as DWH)? >> >>My recollection is that POWER5's SMT is said to give it something like a >>35% boost in TPC-C, and Montecito's coarser-grained 'hyperthreading' is >>said to provide less (more like 25%). Those of course are both >>dual-thread SMT implementations without any more execution units than >>their non-SMT predecessors: EV8's quad-thread implementation did (IIRC) >>contain more execution units, was fine-grained, and was said to provide >>over 2x (possibly as much as 3x - it's been a long time since I visited >>the material) the TPC-C throughput that a non-SMT version would have >>managed. .... > It was estimated by Joel Emer at about a 225-230% boost As a mathematician, you really ought to be more careful with your terminology (and this isn't the first time I've noticed that, which is why I'm commenting upon it): the 'boost' you're describing is 125% - 130%. (hard to tell > with the graph and scale): > > www.cs.washington.edu/research/smt/papers/compaqMF.ppt > > This persuades me that a chip designed for SMT from the ground up can > get quite a bit better than just 40%. Well, even with the added execution units when running only two threads the EV8 managed less than 70% in the 'TP' workload described (and did even worse in some of the other workloads when limited to two threads): the ability to support four concurrent threads (and to keep them reasonably well-supplied with resources) was its most significant advantage. The real question is whether you > are better off with CMP than a wide SMT...hard to say Not really. Even as cores continue to diminish in size, *some* level of SMT will remain desirable insofar as it allows one to put to good use more execution units whether to enhance the performance of a single thread or to enhance the performance of multiple concurrent threads within the single core (i.e., it provides a core which can handle a wider range of workloads more closely to optimally, rather than a static arrangement either starved for execution units when servicing a number of demanding threads lower than the number of cores or leaving execution units idle even when a number of far-less-demanding threads covers all the cores). So the *real* question is whether that's *enough* of an improvement to justify the added design effort (and relatively small additional physical overheads - at least as evidenced by current examples) involved - and if the answer is 'yes', then just what level of multi-threading within the multiple separate cores on a chip is ideal across the normal distribution of real-world workloads (one can't just suggest that SMT could eliminate *all* need for CMP since wire and synchronization delays within a single core are non-negligible factors which bound total core size even if the complexity of, say, supporting many dozens of concurrent threads did not). EV8 may have occupied a unique moment in time when placing multiple relatively high-performance cores on a single chip was not yet quite feasible but when the level of performance desired from a single thread was not yet so limited by the 'memory wall' that single-thread performance had ceased to be so desirable - at least if you could find good uses for the many execution units at other times as well to make the chip more generally useful. Even so, it should be some time yet before such considerations fade away completely (and if ever something pushes back that memory wall sufficiently they'll resurface). - bill .