https://fuse.wikichip.org/news/6853/arm-introduces-the-cortex-a715/ Skip to content Wednesday, June 29, 2022 Latest: * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency * Arm Unveils Next-Gen Flagship Core: Cortex-X3 * Arm Introduces The Cortex-A715 * GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas * A Look At Samsung's 4LPE Process * WikiChip Fuse WikiChip Fuse Your Chips and Semi News [ ] * Home * Account * Main Site * Architectures + x86 + ARM + RISC-V + Power ISA + MIPS * Supercomputers * 14 nm * 12nm * 10nm * 7nm * 5nm Architectures Mobile Processors Arm Introduces The Cortex-A715 June 28, 2022June 29, 2022 David Schor ARM, ARMv9, Cortex, Cortex-A, Cortex-A710, Cortex-A715, Makalu [cortex-a715-header] Last year Arm introduced the Cortex-A710, the company's first ARMv9 implementation in a big core. As it has been a tradition over the past few years around the May/June time, today Arm is introducing their latest next-generation cortex-A710 successor - the Cortex-A715, formerly known as Makalu. --------------------------------------------------------------------- This article is part of a series of articles covering Arm's Client Tech Day 2022. * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency * Arm Introduces The Cortex-A715 * Arm Unveils Next-Gen Flagship Core: Cortex-X3 --------------------------------------------------------------------- Succeeding the Cortex-A710 as the newest big core, the A715 supports largely the same as ARMv9.0 ISA with several enhancements. Perhaps more critically, the new core offers exclusive support for only AArch64 - dropping 32-bit support altogether. The design principles for the A715 remain similar to the prior big core: improve performance at a higher ratio than affecting power and area. With this iteration, performance emphasis was placed on improving throughout without significantly widening the pipeline or extending its depth (although both took place). Finally, Arm engineers introduced targeted improvements - such as to the branch predictor and prefetching enhancements - that were inspired by earlier Cortex-X designs. Power-Efficiency Compared to the Cortex-A710, the new A715 is said to deliver a 5% performance improvement at iso-power. Likewise, at the same performance levels as the A710, the A715 consumes 20% less power. Both comparisons are done at iso-process. Put it differently, Arm says that the new Cortex-A715 can deliver the same performance as the first-generation Cortex-X1 core. The X1 was Arm's flagship performance core in 2020. [cortex-a715-perf] Overall, it's clear that power reduction was more important in this generation - especially in sustained use cases. What's a bit unusual in this core is that the performance improvement seems a bit underwhelming. It's not unheard of for Arm to switch between a large performance uplift and a large power reduction (at a much lower performance uplift), but in this particular case, we were expecting a much bigger uplift given their 2020 Arm TechCon announcement (later reiterated at their Vision Day last year) which promised up to 30% single-core performance over the Cortex-A78. Compared to the A78, in terms of IPC, we're somewhere around 15%. It's unclear why the discrepancy is so big. Nonetheless, the DVFS curve shown below shows good power-efficiency gains across the entire performance spectrum. [cortex-a715-dvfs] Microarchitecture Behind the scenes quite a bit changed in a single generation. The vast majority of changes took place in the front end of the core in the memory subsystem. Fetch Arm spends a lot of time refining their prefetchers and branch predictors. It's part of the reason they can maintain relatively small cache sizes. In this iteration, they doubled the Direction Predictor capacity along with improving its accuracy. In the prior generation, A710, the core was able to predict two unconditional branches per cycle. Now, in the A710, this capability was extended further to support conditional branches. In other words, whereas the A710 could one unconditional conditional and only one conditional branch taken, it can now do two. The other improvement in the A715 is introducing a 3-stage prediction scheme for fast turnaround. Whereas previously, Arm had a fast L0 0-cycle prediction and a slower, 2-cycle prediction structure, with the A715, Arm broke it down into three stages with a new 1-cycle turnaround intermediate structure, reducing the latency to get predictions. With the higher-capacity branch predictor producing higher branch request bandwidth, it's possible to encounter more instances where two separate instruction streams are fetched. To accommodate this, the A715 now supports higher instruction cache lookup bandwidth up to twice the tags/cycles. [arm-cortext-a715-fe] Pure 64-bit Enables Different Tradeoffs The new Cortex-A715 is a pure AArch64 implementation and that means the design team can get rid of various architectural quirks and inefficiencies that came with the 32-bit arch. Arm says that due to the more normal nature of AArch64, the new decoders can not only be more efficiently designed and optimized, but they are also considerably smaller. In fact, Arm says the new decoders are actually "4x smaller than the ones found in the Cortex-A710 with power-saving to match" which is quite remarkable. A lot of changes took place along with those new decoders. Firstly, Arm took the instruction fusion mechanism and moved it directly to the instruction cache. Previously, the A710 did it specifically at the MOP cache. This means, that now, all applications can take advantage of fused instructions at the fetch level (i.e. benefiting from the higher effective instruction throughput). Secondly, previously, some instructions could only be handled by specific decoders. Now all decoders can handle all operations. [arm-cortext-a715-decode] Due to the smaller AArch64 decoder size, Arm added a 5th decode lane. In other words, the new A715 fetch/decode bandwidth now matches the A710 MOP bandwidth while the instruction cache gained the MOP fusion capabilities. By moving many of the benefits of the MOP cache along with its newly added decode lane, Arm says it was able to achieve similar performance without the MOP cache. For this reason it was removed. Removing the cache also offered some area and power gain, albeit in terms of performance, the fairly large design swap largely equal each other out. [a715-mop-decode] Memory Subsystem On the memory subsystem side, the Cortex-A715 grew the load reply queue. This is the structure that holds the issued load access. Arm doubled the number of data cache banks. With more banks, there are now more read/write ports allowing for a higher number of data accesses concurrency. The last change in the A715 is that there are now 50% more L2 TLB entries and along with that Arm says that each entry can now store double the Virtual Addresses (VA) which means that under the right condition it's possible to achieve up to 3x the effective TLB reach over the Cortex-A710. [arm-cortext-a715-memsys] Looking forward, Arm disclosed two new cores for the two years - Hunter and Chaberton. Software support for Neoverse Demeter and Cortex Hunter & Hayes started getting pushed out late last year. [arm-cores-roadmap-2025] - Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us. - * - GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas * Arm Unveils Next-Gen Flagship Core: Cortex-X3 - Share This Post: Related Articles A Look At Trishul: Arm's First High-Density 3D Logic Stacked Test-Chip Samsung Discloses Exynos M4 Changes, Upgrades Support for ARMv8.2, Rearranges The Back-End Arm Launches The DSU-110 For New Armv9 CPU Clusters Arm Launches New Neoverse N1 and E1 Server Cores Cavium Takes ARM to Petascale with Astra Arm Unveils Cortex-A77, Emphasizes Single-Thread Performance Top Six Articles * Arm Unveils Next-Gen Flagship Core: Cortex-X3 * Arm Introduces The Cortex-A715 * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency * A Look At Intel 4 Process Technology * GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas * A Look At Samsung's 4LPE Process Ezoicreport this ad Recent * Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency June 28, 2022June 28, 2022 David Schor * Arm Unveils Next-Gen Flagship Core: Cortex-X3 Arm Unveils Next-Gen Flagship Core: Cortex-X3 June 28, 2022June 28, 2022 David Schor * Arm Introduces The Cortex-A715 Arm Introduces The Cortex-A715 June 28, 2022June 29, 2022 David Schor * GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas GlobalWafers To Build A 1.2M WPM Factory In Sherman, Texas June 27, 2022June 27, 2022 David Schor * A Look At Samsung's 4LPE Process A Look At Samsung's 4LPE Process June 26, 2022June 26, 2022 David Schor * A Look At Intel 4 Process Technology A Look At Intel 4 Process Technology June 19, 2022June 20, 2022 David Schor Ezoicreport this ad Random Picks Japanese AI Startup Preferred Networks Designed A Custom Half-petaFLOPS Training Chip Japanese AI Startup Preferred Networks Designed A Custom Half-petaFLOPS Training Chip November 24, 2019May 25, 2021 David Schor Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous Multi-Core SoC Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous Multi-Core SoC August 19, 2021August 19, 2021 David Schor Intel Launches Desktop Xeon E, Their Fastest Entry-Level Workstation Processors Intel Launches Desktop Xeon E, Their Fastest Entry-Level Workstation Processors July 13, 2018May 25, 2021 David Schor Core i7-8086K Overclockability Silicon Lottery Stats Core i7-8086K Overclockability Silicon Lottery Stats June 17, 2018May 25, 2021 David Schor Intel Labs Builds A Neuromorphic System With 64 To 768 Loihi Chips: 8 Million To 100 Million Neurons Intel Labs Builds A Neuromorphic System With 64 To 768 Loihi Chips: 8 Million To 100 Million Neurons July 15, 2019May 25, 2021 David Schor Random Tags 2.5D packaging 3D packaging 5 nm 5nm 7 nm 7nm 10 nm 10nm 14 nm 16nm AI AMD ARM ARMv8 ARMv9 chiplet Coffee Lake Core i5 Core i7 Cortex edge computing EMIB EUV FinFET GlobalFoundries Hot Chips IBM Ice Lake IEDM inference Intel Intel 7 ISSCC multi-chip package neural processors process technology RISC-V Samsung subscriber only (general) Sunny Cove Supercomputers TSMC VLSI Symposium x86 Zen x86 WorldView All Intel Introduces Thread Director For Heterogeneous Multi-Core Workload Scheduling Desktop Processors Mobile Processors Intel Introduces Thread Director For Heterogeneous Multi-Core Workload Scheduling August 19, 2021August 19, 2021 David Schor Intel introduces the Intel Thread Director for heterogeneous multi-core workload scheduling Intel Unveils Sapphire Rapids: Next-Generation Server CPUs Architectures Server Processors Intel Unveils Sapphire Rapids: Next-Generation Server CPUs August 19, 2021August 19, 2021 David Schor Intel's Gracemont Small Core Eclipses Last-Gen Big Core Performance Architectures Data Processing Unit Desktop Processors Mobile Processors Intel's Gracemont Small Core Eclipses Last-Gen Big Core Performance August 19, 2021August 21, 2021 David Schor Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous Multi-Core SoC Architectures Desktop Processors Mobile Processors Intel Unveils Alder Lake: Next-Generation Mainstream Heterogeneous Multi-Core SoC August 19, 2021August 19, 2021 David Schor Intel Details Golden Cove: Next-Generation Big Core For Client and Server SoCs Architectures Desktop Processors Mobile Processors Server Processors Intel Details Golden Cove: Next-Generation Big Core For Client and Server SoCs August 19, 2021August 19, 2021 David Schor Intel Launches 3rd Gen Ice Lake Xeon Scalable Architectures Server Processors Intel Launches 3rd Gen Ice Lake Xeon Scalable April 6, 2021May 23, 2021 David Schor Random Wave to acquire MIPS Wave to acquire MIPS June 13, 2018May 25, 2021 David Schor Arm Makes Headway In HPC, Cloud Arm Makes Headway In HPC, Cloud November 13, 2019May 25, 2021 David Schor Photonics Chiplet Inches Towards Production Photonics Chiplet Inches Towards Production August 16, 2021August 22, 2021 David Schor Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9 Server Chip Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9 Server Chip October 20, 2021October 20, 2021 David Schor GlobalFoundries 14HP process, a marriage of two technologies GlobalFoundries 14HP process, a marriage of two technologies March 2, 2018May 25, 2021 David Schor Arm Ethos is for Ubiquitous AI At the Edge Arm Ethos is for Ubiquitous AI At the Edge February 6, 2020May 25, 2021 David Schor A Look At Qualcomm's Data Center Inference Accelerator A Look At Qualcomm's Data Center Inference Accelerator September 12, 2021September 13, 2021 David Schor ARM WorldView All Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency Architectures Mobile Processors Arm Refreshes The Cortex-A510, Squeezes Higher Efficiency June 28, 2022June 28, 2022 David Schor Arm Unveils Next-Gen Flagship Core: Cortex-X3 Mobile Processors Arm Unveils Next-Gen Flagship Core: Cortex-X3 June 28, 2022June 28, 2022 David Schor Arm Introduces The Cortex-A715 Architectures Mobile Processors Arm Introduces The Cortex-A715 June 28, 2022June 29, 2022 David Schor Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9 Server Chip Architectures Server Processors Alibaba Open Source XuanTie RISC-V Cores, Introduces In-House Armv9 Server Chip October 20, 2021October 20, 2021 David Schor Marvell Launches 5nm Octeon 10 DPUs with Neoverse N2 cores, AI Acceleration Data Processing Unit Marvell Launches 5nm Octeon 10 DPUs with Neoverse N2 cores, AI Acceleration June 28, 2021June 28, 2021 David Schor Arm Introduces Its Confidential Compute Architecture Architectures Arm Introduces Its Confidential Compute Architecture June 23, 2021June 23, 2021 David Schor About WikiChip WikiChip is an independent publisher based in New York. The WikiChip Fuse section publishes chips and semiconductor related news with our main site offering in-depth semiconductor resources and analysis. WikiChip Links * Main Site * WikiChip Fuse * Newsletter * * Main Site * WikiChip Fuse Copyright (c) 2022 WikiChip LLC. All rights reserved. Spelling error report The following text will be sent to our editors: Your comment (optional): [ ] [ ] [ ] Send Cancel