https://fuse.wikichip.org/news/6853/arm-introduces-the-cortex-a715/

Skip to content
Wednesday, June 29, 2022
Latest:

  * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency
  * Arm Unveils Next-Gen Flagship Core: Cortex-X3
  * Arm Introduces The Cortex-A715
  * GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas
  * A Look At Samsung's 4LPE Process

  *  

WikiChip Fuse

WikiChip Fuse

Your Chips and Semi News

 
[                    ]
  * Home
  * Account
  * Main Site
  * Architectures
      + x86
      + ARM
      + RISC-V
      + Power ISA
      + MIPS
  * Supercomputers
  * 14 nm
  * 12nm
  * 10nm
  * 7nm
  * 5nm

Architectures Mobile Processors 

Arm Introduces The Cortex-A715

June 28, 2022June 29, 2022 David Schor ARM, ARMv9, Cortex, Cortex-A,
Cortex-A710, Cortex-A715, Makalu

[cortex-a715-header]
Last year Arm introduced the Cortex-A710, the company's first ARMv9
implementation in a big core. As it has been a tradition over the
past few years around the May/June time, today Arm is introducing
their latest next-generation cortex-A710 successor - the Cortex-A715,
formerly known as Makalu.

---------------------------------------------------------------------

This article is part of a series of articles covering Arm's Client
Tech Day 2022.

  * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency
  * Arm Introduces The Cortex-A715
  * Arm Unveils Next-Gen Flagship Core: Cortex-X3

---------------------------------------------------------------------

Succeeding the Cortex-A710 as the newest big core, the A715 supports
largely the same as ARMv9.0 ISA with several enhancements. Perhaps
more critically, the new core offers exclusive support for only
AArch64 - dropping 32-bit support altogether. The design principles
for the A715 remain similar to the prior big core: improve
performance at a higher ratio than affecting power and area. With
this iteration, performance emphasis was placed on improving
throughout without significantly widening the pipeline or extending
its depth (although both took place). Finally, Arm engineers
introduced targeted improvements - such as to the branch predictor
and prefetching enhancements - that were inspired by earlier Cortex-X
designs.

 


Power-Efficiency

Compared to the Cortex-A710, the new A715 is said to deliver a 5%
performance improvement at iso-power. Likewise, at the same
performance levels as the A710, the A715 consumes 20% less power.
Both comparisons are done at iso-process. Put it differently, Arm
says that the new Cortex-A715 can deliver the same performance as the
first-generation Cortex-X1 core. The X1 was Arm's flagship
performance core in 2020.

[cortex-a715-perf]

Overall, it's clear that power reduction was more important in this
generation - especially in sustained use cases. What's a bit unusual
in this core is that the performance improvement seems a bit
underwhelming. It's not unheard of for Arm to switch between a large
performance uplift and a large power reduction (at a much lower
performance uplift), but in this particular case, we were expecting a
much bigger uplift given their 2020 Arm TechCon announcement (later
reiterated at their Vision Day last year) which promised up to 30%
single-core performance over the Cortex-A78. Compared to the A78, in
terms of IPC, we're somewhere around 15%. It's unclear why the
discrepancy is so big. Nonetheless, the DVFS curve shown below shows
good power-efficiency gains across the entire performance spectrum.

[cortex-a715-dvfs]

Microarchitecture

Behind the scenes quite a bit changed in a single generation. The
vast majority of changes took place in the front end of the core in
the memory subsystem.

Fetch

Arm spends a lot of time refining their prefetchers and branch
predictors. It's part of the reason they can maintain relatively
small cache sizes. In this iteration, they doubled the Direction
Predictor capacity along with improving its accuracy. In the prior
generation, A710, the core was able to predict two unconditional
branches per cycle. Now, in the A710, this capability was extended
further to support conditional branches. In other words, whereas the
A710 could one unconditional conditional and only one conditional
branch taken, it can now do two.

   


The other improvement in the A715 is introducing a 3-stage prediction
scheme for fast turnaround. Whereas previously, Arm had a fast L0
0-cycle prediction and a slower, 2-cycle prediction structure, with
the A715, Arm broke it down into three stages with a new 1-cycle
turnaround intermediate structure, reducing the latency to get
predictions.

With the higher-capacity branch predictor producing higher branch
request bandwidth, it's possible to encounter more instances where
two separate instruction streams are fetched. To accommodate this,
the A715 now supports higher instruction cache lookup bandwidth up to
twice the tags/cycles.

[arm-cortext-a715-fe]

Pure 64-bit Enables Different Tradeoffs

The new Cortex-A715 is a pure AArch64 implementation and that means
the design team can get rid of various architectural quirks and
inefficiencies that came with the 32-bit arch. Arm says that due to
the more normal nature of AArch64, the new decoders can not only be
more efficiently designed and optimized, but they are also
considerably smaller. In fact, Arm says the new decoders are actually
"4x smaller than the ones found in the Cortex-A710 with power-saving
to match" which is quite remarkable.

A lot of changes took place along with those new decoders. Firstly,
Arm took the instruction fusion mechanism and moved it directly to
the instruction cache. Previously, the A710 did it specifically at
the MOP cache. This means, that now, all applications can take
advantage of fused instructions at the fetch level (i.e. benefiting
from the higher effective instruction throughput). Secondly,
previously, some instructions could only be handled by specific
decoders. Now all decoders can handle all operations.

[arm-cortext-a715-decode]

Due to the smaller AArch64 decoder size, Arm added a 5th decode lane.
In other words, the new A715 fetch/decode bandwidth now matches the
A710 MOP bandwidth while the instruction cache gained the MOP fusion
capabilities. By moving many of the benefits of the MOP cache along
with its newly added decode lane, Arm says it was able to achieve
similar performance without the MOP cache. For this reason it was
removed. Removing the cache also offered some area and power gain,
albeit in terms of performance, the fairly large design swap largely
equal each other out.

 


[a715-mop-decode]

Memory Subsystem

On the memory subsystem side, the Cortex-A715 grew the load reply
queue. This is the structure that holds the issued load access. Arm
doubled the number of data cache banks. With more banks, there are
now more read/write ports allowing for a higher number of data
accesses concurrency. The last change in the A715 is that there are
now 50% more L2 TLB entries and along with that Arm says that each
entry can now store double the Virtual Addresses (VA) which means
that under the right condition it's possible to achieve up to 3x the
effective TLB reach over the Cortex-A710.

[arm-cortext-a715-memsys]

Looking forward, Arm disclosed two new cores for the two years -
Hunter and Chaberton. Software support for Neoverse Demeter and
Cortex Hunter & Hayes started getting pushed out late last year.

[arm-cores-roadmap-2025]

 


-
Spotted an error? Help us fix it! Simply select the problematic text
and press Ctrl+Enter to notify us.
-

  * - GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas
  * Arm Unveils Next-Gen Flagship Core: Cortex-X3 -

Share This Post:

Related Articles

A Look At Trishul: Arm's First High-Density 3D Logic Stacked
Test-Chip

Samsung Discloses Exynos M4 Changes, Upgrades Support for ARMv8.2,
Rearranges The Back-End

Arm Launches The DSU-110 For New Armv9 CPU Clusters

Arm Launches New Neoverse N1 and E1 Server Cores

Cavium Takes ARM to Petascale with Astra

Arm Unveils Cortex-A77, Emphasizes Single-Thread Performance

Top Six Articles

  * 
    Arm Unveils Next-Gen Flagship Core: Cortex-X3
  * 
    Arm Introduces The Cortex-A715
  * 
    Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency
  * 
    A Look At Intel 4 Process Technology
  * 
    GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas
  * 
    A Look At Samsung's 4LPE Process

Ezoicreport this ad

Recent

  * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency

    Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency

    June 28, 2022June 28, 2022 David Schor
  * Arm Unveils Next-Gen Flagship Core: Cortex-X3

    Arm Unveils Next-Gen Flagship Core: Cortex-X3

    June 28, 2022June 28, 2022 David Schor
  * Arm Introduces The Cortex-A715

    Arm Introduces The Cortex-A715

    June 28, 2022June 29, 2022 David Schor
  * GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas

    GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas

    June 27, 2022June 27, 2022 David Schor
  * A Look At Samsung's 4LPE Process

    A Look At Samsung's 4LPE Process

    June 26, 2022June 26, 2022 David Schor
  * A Look At Intel 4 Process Technology

    A Look At Intel 4 Process Technology

    June 19, 2022June 20, 2022 David Schor

Ezoicreport this ad

Random Picks

Japanese AI Startup Preferred Networks Designed A Custom
Half-petaFLOPS Training Chip

Japanese AI Startup Preferred Networks Designed A Custom
Half-petaFLOPS Training Chip

November 24, 2019May 25, 2021 David Schor
Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous
Multi-Core SoC

Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous
Multi-Core SoC

August 19, 2021August 19, 2021 David Schor
Intel Launches Desktop Xeon E, Their Fastest Entry-Level Workstation
Processors

Intel Launches Desktop Xeon E, Their Fastest Entry-Level Workstation
Processors

July 13, 2018May 25, 2021 David Schor
Core i7-8086K Overclockability Silicon Lottery Stats

Core i7-8086K Overclockability Silicon Lottery Stats

June 17, 2018May 25, 2021 David Schor
Intel Labs Builds A Neuromorphic System With 64 To 768 Loihi Chips: 8
Million To 100 Million Neurons

Intel Labs Builds A Neuromorphic System With 64 To 768 Loihi Chips: 8
Million To 100 Million Neurons

July 15, 2019May 25, 2021 David Schor

Random Tags

2.5D packaging 3D packaging 5 nm 5nm 7 nm 7nm 10 nm 10nm 14 nm 16nm
AI AMD ARM ARMv8 ARMv9 chiplet Coffee Lake Core i5 Core i7 Cortex
edge computing EMIB EUV FinFET GlobalFoundries Hot Chips IBM Ice Lake
IEDM inference Intel Intel 7 ISSCC multi-chip package neural
processors process technology RISC-V Samsung subscriber only
(general) Sunny Cove Supercomputers TSMC VLSI Symposium x86 Zen

x86 WorldView All

Intel Introduces Thread Director For Heterogeneous Multi-Core
Workload Scheduling
Desktop Processors Mobile Processors 

Intel Introduces Thread Director For Heterogeneous Multi-Core
Workload Scheduling

August 19, 2021August 19, 2021 David Schor

Intel introduces the Intel Thread Director for heterogeneous
multi-core workload scheduling

Intel Unveils Sapphire Rapids: Next-Generation Server CPUs
Architectures Server Processors 

Intel Unveils Sapphire Rapids: Next-Generation Server CPUs

August 19, 2021August 19, 2021 David Schor
Intel's Gracemont Small Core Eclipses Last-Gen Big Core Performance
Architectures Data Processing Unit Desktop Processors Mobile
Processors 

Intel's Gracemont Small Core Eclipses Last-Gen Big Core Performance

August 19, 2021August 21, 2021 David Schor
Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous
Multi-Core SoC
Architectures Desktop Processors Mobile Processors 

Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous
Multi-Core SoC

August 19, 2021August 19, 2021 David Schor
Intel Details Golden Cove: Next-Generation Big Core For Client and
Server SoCs
Architectures Desktop Processors Mobile Processors Server Processors 

Intel Details Golden Cove: Next-Generation Big Core For Client and
Server SoCs

August 19, 2021August 19, 2021 David Schor
Intel Launches 3rd Gen Ice Lake Xeon Scalable
Architectures Server Processors 

Intel Launches 3rd Gen Ice Lake Xeon Scalable

April 6, 2021May 23, 2021 David Schor

Random

Wave to acquire MIPS

Wave to acquire MIPS

June 13, 2018May 25, 2021 David Schor
Arm Makes Headway In HPC, Cloud

Arm Makes Headway In HPC, Cloud

November 13, 2019May 25, 2021 David Schor
Photonics Chiplet Inches Towards Production

Photonics Chiplet Inches Towards Production

August 16, 2021August 22, 2021 David Schor
Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9
Server Chip

Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9
Server Chip

October 20, 2021October 20, 2021 David Schor
GlobalFoundries 14HP process, a marriage of two technologies

GlobalFoundries 14HP process, a marriage of two technologies

March 2, 2018May 25, 2021 David Schor
Arm Ethos is for Ubiquitous AI At the Edge

Arm Ethos is for Ubiquitous AI At the Edge

February 6, 2020May 25, 2021 David Schor
A Look At Qualcomm's Data Center Inference Accelerator

A Look At Qualcomm's Data Center Inference Accelerator

September 12, 2021September 13, 2021 David Schor

ARM WorldView All

Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency
Architectures Mobile Processors 

Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency

June 28, 2022June 28, 2022 David Schor
Arm Unveils Next-Gen Flagship Core: Cortex-X3
Mobile Processors 

Arm Unveils Next-Gen Flagship Core: Cortex-X3

June 28, 2022June 28, 2022 David Schor
Arm Introduces The Cortex-A715
Architectures Mobile Processors 

Arm Introduces The Cortex-A715

June 28, 2022June 29, 2022 David Schor
Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9
Server Chip
Architectures Server Processors 

Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9
Server Chip

October 20, 2021October 20, 2021 David Schor
Marvell Launches 5nm Octeon 10 DPUs with Neoverse N2 cores, AI
Acceleration
Data Processing Unit 

Marvell Launches 5nm Octeon 10 DPUs with Neoverse N2 cores, AI
Acceleration

June 28, 2021June 28, 2021 David Schor
Arm Introduces Its Confidential Compute Architecture
Architectures 

Arm Introduces Its Confidential Compute Architecture

June 23, 2021June 23, 2021 David Schor

About

WikiChip
WikiChip is an independent publisher based in New York. The WikiChip
Fuse section publishes chips and semiconductor related news with our
main site offering in-depth semiconductor resources and analysis.

WikiChip Links

  * Main Site
  * WikiChip Fuse
  * Newsletter

  *  

  * Main Site
  * WikiChip Fuse

Copyright (c) 2022 WikiChip LLC. All rights reserved.

 

Spelling error report

The following text will be sent to our editors:

Your comment (optional):

[                                                            ]
[                                                            ]
[                                                            ]
Send Cancel