https://www.tomshardware.com/pc-components/cpus/spec-invalidates-2600-intel-cpu-benchmarks-says-companys-compiler-used-unfair-optimizations-that-boosted-performance Skip to main content (*) ( ) Open menu Close menu Tom's Hardware [ ] Search Search Tom's Hardware [ ] RSS US Edition flag of US flag of UK UK flag of US US flag of Australia Australia flag of Canada Canada * * Reviews * Best Picks * Raspberry Pi * CPUs * GPUs * Coupons * Newsletter * More + News + PC Components + SSDs + Motherboards + PC Building + Monitors + Laptops + Desktops + Cooling + Cases + RAM + Power Supplies + 3D Printers + Peripherals + Overclocking + About Us Forums Trending * AMD Ryzen 7 8700G * Nvidia RTX 4080 Super * Intel 14th-Gen Raptor Lake * PCIe 5.0 SSDs When you purchase through links on our site, we may earn an affiliate commission. Here's how it works. 1. PC Components 2. CPUs Industry group invalidates 2,600 official Intel CPU benchmarks -- SPEC says the company's compiler used unfair optimizations to boost performance News By Matthew Connatser published 16 February 2024 It mostly affects Sapphire Rapids benchmarks. * * * * * * * Comments (38) Sapphire Rapids (Image credit: Intel) SPEC says it will no longer be publishing SPEC CPU 2017 results for Intel CPUs running a specific version of the Intel compiler, citing displeasure over an apparent targeted optimization for a specific workload (via ServeTheHome and Phoronix) that essentially amounts to cheating. A note has been added to the more than 2,600 benchmark results published with the offending compiler, effectively invalidating those results, mostly from machines running 4th Gen Xeon Sapphire Rapids CPUs. SPEC CPU 2017 is a benchmark mostly used for high-end servers, data centers, and workstations/PCs, and it tests performance in various workloads in a standardized way so that different computers can be compared to each other. Good performance in SPEC CPU 2017 hinges not just on hardware but also on software. One of the key factors in software-side optimization is the compiler, which is a program that basically takes written code and reformats it in a way that a processor can run it best. The disclaimer that it is now attached to over 2,600 SPEC CPU 2017 results states, "The compiler used for this result was performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge." This means the compiler (in this case, Intel's oneAPI DPC++/C++ Compiler) was not optimized for the kind of workload the two SPEC CPU 2017 benchmarks in question test, but specifically the two benchmarks themselves While it's expected that compilers should be optimized since more performance is obviously better, optimizing specifically for benchmarks is controversial and frowned upon. SPEC wants its benchmarks to reflect the real-world performance of hardware and to provide a standardized way to compare different processors. But if a compiler optimization only improves performance in a particular benchmark and not in a real-world scenario, then that's clearly not reflective of the real world and will only be reflected in that specific benchmark. According to Phoronix, the optimization could boost performance in SPECint by 9% overall. The publication also notes that versions 2022.0 to 2023.0 of the Intel oneAPI Compiler are impacted, meaning most of the now-invalidated results were run in 2022, largely on Sapphire Rapids CPUs. Results for fifth-gen Xeon Emerald Rapids CPUs are very unlikely to have been running a version of the compiler with the banned optimization since Emerald Rapids came out after the good versions of the compiler were available. Benchmark-specific optimization has been a hot topic for years. Back in 2003, Nvidia was accused of performing a driver-side optimization to boost the performance of its GPUs in 3DMark 2003. In 2010, Nvidia itself alleged that AMD was cheating in actual games by not enabling certain driver-side settings that would have significantly boosted visual quality at the expense of performance. Accusations these days don't get quite as heated, though SPEC has certainly shamed Intel in this case. Stay on the Cutting Edge Join the experts who read Tom's Hardware for the inside track on enthusiast PC tech news -- and have for over 25 years. We'll send breaking news and in-depth reviews of CPUs, GPUs, AI, maker hardware and more straight to your inbox. [ ][ ]Contact me with news and offers from other Future brands[ ]Receive email from us on behalf of our trusted partners or sponsors[Sign me up] By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over. Matthew Connatser Matthew Connatser Social Links Navigation Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general. See more CPUs News More about cpus Intel Lunar Lake. Intel's next-gen CPU boosts to 2.8 GHz without Hyper-Threading -- Lunar Lake chip with eight cores, eight threads has a bigger L2 cache than the L3 cache AMD AMD's brings powerful RDNA 3 graphics to power-sipping 35W APUs -- Ryzen 8000GE comes in both standard and hybrid configurations Latest Generic silicon wafer for Samsung 3D NAND v9 story Softbank founder reportedly aims to raise $100 billion to build AI chip company that would rival Nvidia -- Project Izanagi might leverage Arm design See more latest > See all comments (38) [ ] 38 Comments Comment from the forums * PEnns "...essentially amounts to cheating" WHAT??, Intel cheating?? I am shocked, shocked!! Reply * peachpuff Reply * TerryLaze What the heck SPEC, so basically what you are saying is that you are using useless benchmarks that don't target any kind of workload, or even specific applications, but then you are salty that somebody optimizes their compiler for it?! Also how is SPEC NOT a specific application, it doesn't get any more specific than benchmarks that don't target any kind of workload, or even specific applications . The disclaimer that it is now attached to over 2,600 SPEC CPU 2017 results states, "The compiler used for this result was performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge." This means the compiler (in this case, Intel's oneAPI DPC++/C++ Compiler) was not optimized for a particular kind of workload, or even for specific applications, but specifically for two SPEC CPU 2017 benchmarks. Reply * -Fran- I'll just leave this here. https://www.cnet.com/science/ amd-quits-benchmark-group-implying-intel-bias/ Regards. Reply * bit_user 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge." This means the compiler (in this case, Intel's oneAPI DPC++/C++ Compiler) was not optimized for a particular kind of workload, or even for specific applications, but specifically for two SPEC CPU 2017 benchmarks. I think it's basically just one benchmark that's included in two different suites. Xalan is a XSLT processor developed under the umbrella of the Apache Software Foundation. As for the _r and _s distinction, these signify rate vs. speed. SPEC explains them as follows: "There are many ways to measure computer performance. Among the most common are: Time - For example, seconds to complete a workload. Throughput - Work completed per unit of time, for example, jobs per hour. SPECspeed is a time-based metric; SPECrate is a throughput metric." https://www.spec.org/cpu2017/Docs/overview.html#Q15 They further list several key differences, but these two jumped out at me: For speed, 1 copy of each benchmark in a suite is run. For rate, the tester chooses how many concurrent copies to run. For speed, the tester may choose how many OpenMP threads to use. For rate OpenMP is disabled. I'm a little surprised by the latter point, but I guess it makes sense. What it means is that SPECspeed shouldn't be taken purely as a proxy for single-threaded performance. You really ought to use SPECrate for that, which I think is what I've seen. Results for fifth-gen Xeon Emerald Rapids CPUs are very unlikely to have been running a version of the compiler with the banned optimization since Emerald Rapids came out after the good versions of the compiler were available. I'm not sure how the author concludes this. I don't see anywhere to download previous versions of Intel's DPC++ compiler. The latest is 2024.0.2 and that release is dated Dec. 18th, 2023. BTW, I wonder who tipped them off. Did someone just notice those results were suspiciously good and start picking apart the generated code, or did a disgruntled ex-Intel employee maybe drop a dime? Reply * bit_user TerryLaze said: What the heck SPEC, Standard Performance Evaluation Corporation The System Performance Evaluation Cooperative, now named the Standard Performance Evaluation Corporation (SPEC), was founded in 1988 by a small number of workstation vendors who realized that the marketplace was in desperate need of realistic, standardized performance tests. The key realization was that an ounce of honest data was worth more than a pound of marketing hype. SPEC publishes several hundred different performance results each quarter spanning a variety of system performance disciplines. The goal of SPEC is to ensure that the marketplace has a fair and useful set of metrics to differentiate candidate systems. The path chosen is an attempt to balance requiring strict compliance and allowing vendors to demonstrate their advantages. The belief is that a good test that is reasonable to utilize will lead to a greater availability of results in the marketplace. SPEC is a non-profit organization that establishes, maintains and endorses standardized benchmarks and tools to evaluate performance for the newest generation of computing systems. Its membership comprises more than 120 leading computer hardware and software vendors, educational institutions, research organizations, and government agencies worldwide. https://www.spec.org/spec/ One neat thing about SPECbench is that you actually get it in the form of source code that you can compile and run just about anywhere. For years, Anandtech even managed to run it on iPhone and Android phone SoCs. This allowed them to compare performance and efficiency relative to desktop x86 and other types of CPUs. As far as I'm aware, GeekBench is one of the only other modern, cross-platform benchmarks. However, unlike SPECbench, it's basically a black box. This makes it a ripe target for allegations of bias towards one kind of CPU or platform vs. others. TerryLaze said: so basically what you are saying is that you are using useless benchmarks that don't target any kind of workload, or even specific applications No, SPECbench is comprised of real world, industry-standard applications. TerryLaze said: then you are salty that somebody optimizes their compiler for it?! Yes. The article explains that the benchmark suite is intended to be predictive of how a given system will perform on certain workloads. If a vendor does highly-targeted compiler optimizations for the benchmark, those don't carry over to similar workloads and thus invalidate the benchmark. That undermines the whole point of SPECbench, which is why they need to take a hard line on this sort of activity. Reply * JamesJones44 Let's be honest, who believes any of the benchmarks released by any host company? Not a day goes by where an independent benchmark looks different than what Apple, Intel, AMD, Nvidia, Micron, etc. stated as their benchmarks. Reply * bit_user JamesJones44 said: Let's be honest, who believes any of the benchmarks released by any host company? Not a day goes by where an independent benchmark looks different than what Apple, Intel, AMD, Nvidia, Micron, etc. stated as their benchmarks. That's not what this is about. SPEC gets submissions for an entire system. As such, they're usually submitted by OEMs and integrators. There's a natural tendency to use the compiler suite provided by the CPU maker, since those have all of the latest and greatest optimizations and tuning for the specific CPU model. That's where the trouble started. SPEC has various rules governing they way systems are supposed to be benchmarked, in order to be eligible for submission. It's a little like the Guinness book of World Records, or perhaps certain athletics bodies and their rules concerning official world records. Reply * punkncat What do you mean the new car I just purchased doesn't really get 40 MPG in real world conditions? AGAST! Why does this even qualify as news? This isn't anything novel or unheard of. Whispers about it going on for years now. If anyone is surprised, they are also naive...and I got some beachfront property to sell you... Reply * TerryLaze bit_user said: Yes. The article explains that the benchmark suite is intended to be predictive of how a given system will perform on certain workloads. If a vendor does highly-targeted compiler optimizations for the benchmark, those don't carry over to similar workloads and thus invalidate the benchmark. That undermines the whole point of SPECbench, which is why they need to take a hard line on this sort of activity. I don't get the distinction... If it's predictive of possible compiler optimizations and intel actually did those compiler optimizations then what's the issue?! It can't be both ways, either these particular benches are useless, or intel optimized the compiler towards whatever the predicted use case was. Reply * View All 38 Comments Show more comments Most Popular [missing-im]Intel's CHIPS Act award package exceeds $10 billion, payout expected within two weeks: Report By Anton ShilovFebruary 17, 2024 [missing-im]Legendary chip architect Jim Keller responds to Sam Altman's plan to raise $7 trillion to make AI chips -- 'I can do it for less than $1 trillion' By Anton ShilovFebruary 17, 2024 [missing-im]New HAMR lasers could usher in 30TB+ HDDs, Seagate and Sony team up for production: Report By Anton ShilovFebruary 16, 2024 [missing-im]Raspberry Pi RP2040 spotted in super tiny unofficial Game Boy handheld By Ash HillFebruary 16, 2024 [missing-im]ASML dethrones Applied Materials, becomes world's largest fab tool maker: analyst By Anton ShilovFebruary 16, 2024 [missing-im]Microsoft confirms four Xbox exclusives coming to PS5 and Switch, sees a future 'Where every screen is an Xbox' By Christopher HarperFebruary 16, 2024 [missing-im]U.S. chip fab construction is among the slowest in the world -- a complex web of regulations is to blame according to study By Matthew ConnatserFebruary 16, 2024 [missing-im]Qualcomm reveals 'Sound of Snapdragon' audio logo -- similar to Intel's Jingle, Netflix's 'ta-dum' By Mark TysonFebruary 16, 2024 [missing-im]ASML explores Hyper-NA chipmaking tools as the next step in shrinking transistors -- tools would debut in 2030, but significant technology and cost hurdles remain By Anton ShilovFebruary 16, 2024 [missing-im]Prototype LaserDisc HD media sells for $1,000 -- despite disc rot and unknown contents By Mark TysonFebruary 16, 2024 [missing-im]Russian military botnet discovered on 1000+ compromised routers -- FBI deactivated Moobot by taking control of impacted routers By Christopher HarperFebruary 16, 2024 Tom's Hardware is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site. * Terms and conditions * Contact Future's experts * Privacy policy * Cookies policy * Accessibility Statement * Advertise * About us * Coupons * Careers (c) Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY 10036.