https://www.tomshardware.com/pc-components/cpus/spec-invalidates-2600-intel-cpu-benchmarks-says-companys-compiler-used-unfair-optimizations-that-boosted-performance

Skip to main content
(*) ( )
Open menu Close menu
 Tom's Hardware
[ ] Search
Search Tom's Hardware [                    ]
RSS
     
US Edition
flag of US
 
flag of UK
UK
 
flag of US
US
 
flag of Australia
Australia
 
flag of Canada
Canada

  *  
  * Reviews
  * Best Picks
  * Raspberry Pi
  * CPUs
  * GPUs
  * Coupons
  * Newsletter
  * More
      + News
      + PC Components
      + SSDs
      + Motherboards
      + PC Building
      + Monitors
      + Laptops
      + Desktops
      + Cooling
      + Cases
      + RAM
      + Power Supplies
      + 3D Printers
      + Peripherals
      + Overclocking
      + About Us

Forums
Trending

  * AMD Ryzen 7 8700G
  * Nvidia RTX 4080 Super
  * Intel 14th-Gen Raptor Lake
  * PCIe 5.0 SSDs

When you purchase through links on our site, we may earn an affiliate
commission. Here's how it works.

 1. PC Components
 2. CPUs

Industry group invalidates 2,600 official Intel CPU benchmarks -- SPEC
says the company's compiler used unfair optimizations to boost
performance

News
By Matthew Connatser
published 16 February 2024

It mostly affects Sapphire Rapids benchmarks.

  *  
  *  
  *  
  *  
  *  
  *  
  *  

Comments (38)
Sapphire Rapids
(Image credit: Intel)

SPEC says it will no longer be publishing SPEC CPU 2017 results for
Intel CPUs running a specific version of the Intel compiler, citing
displeasure over an apparent targeted optimization for a specific
workload (via ServeTheHome and Phoronix) that essentially amounts to
cheating. A note has been added to the more than 2,600 benchmark
results published with the offending compiler, effectively
invalidating those results, mostly from machines running 4th Gen Xeon
Sapphire Rapids CPUs.

SPEC CPU 2017 is a benchmark mostly used for high-end servers, data
centers, and workstations/PCs, and it tests performance in various
workloads in a standardized way so that different computers can be
compared to each other. Good performance in SPEC CPU 2017 hinges not
just on hardware but also on software. One of the key factors in
software-side optimization is the compiler, which is a program that
basically takes written code and reformats it in a way that a
processor can run it best.

The disclaimer that it is now attached to over 2,600 SPEC CPU 2017
results states, "The compiler used for this result was performing a
compilation that specifically improves the performance of the
523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori
knowledge." This means the compiler (in this case, Intel's oneAPI
DPC++/C++ Compiler) was not optimized for the kind of workload the
two SPEC CPU 2017 benchmarks in question test, but specifically the
two benchmarks themselves

While it's expected that compilers should be optimized since more
performance is obviously better, optimizing specifically for
benchmarks is controversial and frowned upon. SPEC wants its
benchmarks to reflect the real-world performance of hardware and to
provide a standardized way to compare different processors. But if a
compiler optimization only improves performance in a particular
benchmark and not in a real-world scenario, then that's clearly not
reflective of the real world and will only be reflected in that
specific benchmark.

According to Phoronix, the optimization could boost performance in
SPECint by 9% overall. The publication also notes that versions
2022.0 to 2023.0 of the Intel oneAPI Compiler are impacted, meaning
most of the now-invalidated results were run in 2022, largely on
Sapphire Rapids CPUs. Results for fifth-gen Xeon Emerald Rapids CPUs
are very unlikely to have been running a version of the compiler with
the banned optimization since Emerald Rapids came out after the good
versions of the compiler were available.

Benchmark-specific optimization has been a hot topic for years. Back
in 2003, Nvidia was accused of performing a driver-side optimization
to boost the performance of its GPUs in 3DMark 2003. In 2010, Nvidia
itself alleged that AMD was cheating in actual games by not enabling
certain driver-side settings that would have significantly boosted
visual quality at the expense of performance. Accusations these days
don't get quite as heated, though SPEC has certainly shamed Intel in
this case.

Stay on the Cutting Edge

Join the experts who read Tom's Hardware for the inside track on
enthusiast PC tech news -- and have for over 25 years. We'll send
breaking news and in-depth reviews of CPUs, GPUs, AI, maker hardware
and more straight to your inbox.

[                    ][ ]Contact me with news and offers from other
Future brands[ ]Receive email from us on behalf of our trusted
partners or sponsors[Sign me up]
By submitting your information you agree to the Terms & Conditions
and Privacy Policy and are aged 16 or over.
Matthew Connatser
Matthew Connatser
Social Links Navigation
  

Matthew Connatser is a freelancing writer for Tom's Hardware US. He
writes articles about CPUs, GPUs, SSDs, and computers in general.

See more CPUs News
More about cpus
 
Intel Lunar Lake.

Intel's next-gen CPU boosts to 2.8 GHz without Hyper-Threading --
Lunar Lake chip with eight cores, eight threads has a bigger L2 cache
than the L3 cache

 
AMD

AMD's brings powerful RDNA 3 graphics to power-sipping 35W APUs --
Ryzen 8000GE comes in both standard and hybrid configurations

Latest
 
Generic silicon wafer for Samsung 3D NAND v9 story

Softbank founder reportedly aims to raise $100 billion to build AI
chip company that would rival Nvidia -- Project Izanagi might leverage
Arm design

See more latest >
See all comments (38)
[ ]
38 Comments Comment from the forums

  * 
    PEnns
    "...essentially amounts to cheating"
    WHAT??, Intel cheating?? I am shocked, shocked!!
    Reply
  * 
    peachpuff
    Reply
  * 
    TerryLaze
    What the heck SPEC, so basically what you are saying is that you
    are using useless benchmarks that don't target any kind of
    workload, or even specific applications, but then you are salty
    that somebody optimizes their compiler for it?!
    Also how is SPEC NOT a specific application, it doesn't get any
    more specific than benchmarks that don't target any kind of
    workload, or even specific applications .

        The disclaimer that it is now attached to over 2,600 SPEC CPU
        2017 results states, "The compiler used for this result was
        performing a compilation that specifically improves the
        performance of the 523.xalancbmk_r / 623.xalancbmk_s
        benchmarks using a priori knowledge." This means the compiler
        (in this case, Intel's oneAPI DPC++/C++ Compiler) was not
        optimized for a particular kind of workload, or even for
        specific applications, but specifically for two SPEC CPU 2017
        benchmarks.

    Reply
  * 
    -Fran-
    I'll just leave this here.

    https://www.cnet.com/science/
    amd-quits-benchmark-group-implying-intel-bias/
    Regards.
    Reply
  * 
    bit_user

        523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori
        knowledge." This means the compiler (in this case, Intel's
        oneAPI DPC++/C++ Compiler) was not optimized for a particular
        kind of workload, or even for specific applications, but
        specifically for two SPEC CPU 2017 benchmarks.

    I think it's basically just one benchmark that's included in two
    different suites. Xalan is a XSLT processor developed under the
    umbrella of the Apache Software Foundation.

    As for the _r and _s distinction, these signify rate vs. speed.
    SPEC explains them as follows:
    "There are many ways to measure computer performance. Among the
    most common are:
    Time - For example, seconds to complete a workload.
    Throughput - Work completed per unit of time, for example, jobs
    per hour.
    SPECspeed is a time-based metric; SPECrate is a throughput
    metric."

    https://www.spec.org/cpu2017/Docs/overview.html#Q15
    They further list several key differences, but these two jumped
    out at me:
    For speed, 1 copy of each benchmark in a suite is run. For rate,
    the tester chooses how many concurrent copies to run.
    For speed, the tester may choose how many OpenMP threads to use.
    For rate OpenMP is disabled.
    I'm a little surprised by the latter point, but I guess it makes
    sense. What it means is that SPECspeed shouldn't be taken purely
    as a proxy for single-threaded performance. You really ought to
    use SPECrate for that, which I think is what I've seen.


        Results for fifth-gen Xeon Emerald Rapids CPUs are very
        unlikely to have been running a version of the compiler with
        the banned optimization since Emerald Rapids came out after
        the good versions of the compiler were available.

    I'm not sure how the author concludes this. I don't see anywhere
    to download previous versions of Intel's DPC++ compiler. The
    latest is 2024.0.2 and that release is dated Dec. 18th, 2023.

    BTW, I wonder who tipped them off. Did someone just notice those
    results were suspiciously good and start picking apart the
    generated code, or did a disgruntled ex-Intel employee maybe drop
    a dime?
    Reply
  * 
    bit_user

        TerryLaze said:
        What the heck SPEC,

    Standard Performance Evaluation Corporation
    The System Performance Evaluation Cooperative, now named the
    Standard Performance Evaluation Corporation (SPEC), was founded
    in 1988 by a small number of workstation vendors who realized
    that the marketplace was in desperate need of realistic,
    standardized performance tests. The key realization was that an 
    ounce of honest data was worth more than a pound of marketing
    hype.

    SPEC publishes several hundred different performance results each
    quarter spanning a variety of system performance disciplines.

    The goal of SPEC is to ensure that the marketplace has a fair and
    useful set of metrics to differentiate candidate systems. The
    path chosen is an attempt to balance requiring strict compliance
    and allowing vendors to demonstrate their advantages. The belief
    is that a good test that is reasonable to utilize will lead to a
    greater availability of results in the marketplace.

    SPEC is a non-profit organization that establishes, maintains and
    endorses standardized benchmarks and tools to evaluate
    performance for the newest generation of computing systems. Its
    membership comprises more than 120 leading computer hardware and
    software vendors, educational institutions, research
    organizations, and government agencies worldwide.

    https://www.spec.org/spec/
    One neat thing about SPECbench is that you actually get it in the
    form of source code that you can compile and run just about
    anywhere. For years, Anandtech even managed to run it on iPhone
    and Android phone SoCs. This allowed them to compare performance
    and efficiency relative to desktop x86 and other types of CPUs.

    As far as I'm aware, GeekBench is one of the only other modern,
    cross-platform benchmarks. However, unlike SPECbench, it's
    basically a black box. This makes it a ripe target for
    allegations of bias towards one kind of CPU or platform vs.
    others.


        TerryLaze said:
        so basically what you are saying is that you are using
        useless benchmarks that don't target any kind of workload, or
        even specific applications

    No, SPECbench is comprised of real world, industry-standard
    applications.


        TerryLaze said:
        then you are salty that somebody optimizes their compiler for
        it?!

    Yes. The article explains that the benchmark suite is intended to
    be predictive of how a given system will perform on certain
    workloads. If a vendor does highly-targeted compiler
    optimizations for the benchmark, those don't carry over to
    similar workloads and thus invalidate the benchmark. That
    undermines the whole point of SPECbench, which is why they need
    to take a hard line on this sort of activity.
    Reply
  * 
    JamesJones44
    Let's be honest, who believes any of the benchmarks released by
    any host company? Not a day goes by where an independent
    benchmark looks different than what Apple, Intel, AMD, Nvidia,
    Micron, etc. stated as their benchmarks.
    Reply
  * 
    bit_user

        JamesJones44 said:
        Let's be honest, who believes any of the benchmarks released
        by any host company? Not a day goes by where an independent
        benchmark looks different than what Apple, Intel, AMD,
        Nvidia, Micron, etc. stated as their benchmarks.

    That's not what this is about.

    SPEC gets submissions for an entire system. As such, they're
    usually submitted by OEMs and integrators. There's a natural
    tendency to use the compiler suite provided by the CPU maker,
    since those have all of the latest and greatest optimizations and
    tuning for the specific CPU model. That's where the trouble
    started.

    SPEC has various rules governing they way systems are supposed to
    be benchmarked, in order to be eligible for submission. It's a
    little like the Guinness book of World Records, or perhaps
    certain athletics bodies and their rules concerning official
    world records.
    Reply
  * 
    punkncat
    What do you mean the new car I just purchased doesn't really get
    40 MPG in real world conditions?

    AGAST!

    Why does this even qualify as news? This isn't anything novel or
    unheard of. Whispers about it going on for years now. If anyone
    is surprised, they are also naive...and I got some beachfront
    property to sell you...
    Reply
  * 
    TerryLaze

        bit_user said:
        Yes. The article explains that the benchmark suite is
        intended to be predictive of how a given system will perform
        on certain workloads. If a vendor does highly-targeted
        compiler optimizations for the benchmark, those don't carry
        over to similar workloads and thus invalidate the benchmark.
        That undermines the whole point of SPECbench, which is why
        they need to take a hard line on this sort of activity.

    I don't get the distinction...
    If it's predictive of possible compiler optimizations and intel
    actually did those compiler optimizations then what's the issue?!

    It can't be both ways,
    either these particular benches are useless, or intel optimized
    the compiler towards whatever the predicted use case was.
    Reply
  * View All 38 Comments

Show more comments

Most Popular

[missing-im]Intel's CHIPS Act award package exceeds $10 billion,
payout expected within two weeks: Report

By Anton ShilovFebruary 17, 2024

[missing-im]Legendary chip architect Jim Keller responds to Sam
Altman's plan to raise $7 trillion to make AI chips -- 'I can do it
for less than $1 trillion'

By Anton ShilovFebruary 17, 2024

[missing-im]New HAMR lasers could usher in 30TB+ HDDs, Seagate and
Sony team up for production: Report

By Anton ShilovFebruary 16, 2024

[missing-im]Raspberry Pi RP2040 spotted in super tiny unofficial Game
Boy handheld

By Ash HillFebruary 16, 2024

[missing-im]ASML dethrones Applied Materials, becomes world's largest
fab tool maker: analyst

By Anton ShilovFebruary 16, 2024

[missing-im]Microsoft confirms four Xbox exclusives coming to PS5 and
Switch, sees a future 'Where every screen is an Xbox'

By Christopher HarperFebruary 16, 2024

[missing-im]U.S. chip fab construction is among the slowest in the
world -- a complex web of regulations is to blame according to study

By Matthew ConnatserFebruary 16, 2024

[missing-im]Qualcomm reveals 'Sound of Snapdragon' audio logo --
similar to Intel's Jingle, Netflix's 'ta-dum'

By Mark TysonFebruary 16, 2024

[missing-im]ASML explores Hyper-NA chipmaking tools as the next step
in shrinking transistors -- tools would debut in 2030, but significant
technology and cost hurdles remain

By Anton ShilovFebruary 16, 2024

[missing-im]Prototype LaserDisc HD media sells for $1,000 -- despite
disc rot and unknown contents

By Mark TysonFebruary 16, 2024

[missing-im]Russian military botnet discovered on 1000+ compromised
routers -- FBI deactivated Moobot by taking control of impacted
routers

By Christopher HarperFebruary 16, 2024

Tom's Hardware is part of Future US Inc, an international media group
and leading digital publisher. Visit our corporate site.

  * Terms and conditions
  * Contact Future's experts
  * Privacy policy
  * Cookies policy
  * Accessibility Statement
  * Advertise
  * About us
  * Coupons
  * Careers

(c) Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY
10036.