https://danielmangum.com/posts/vpr-nordic-risc-v-processor/ Daniel Mangum [ ] * [RISC-V Bytes] * [RISC-V Tips] * [Blog] * [About] VPR: Nordic's First RISC-V Processor December 24, 2024 13-minute read VPR (pronounced "Viper") is Nordic Semiconductor's first RISC-V processor, landing in the new nRF54H and nRF54L lines of SoCs after their initial announcements in April and October of 2023 respectively. Readers of this blog are familiar with my long-running [DEL:obsession:DEL] interest in RISC-V (see my RISC-V Tips and RISC-V Bytes series). However, Nordic's introduction of a RISC-V processor is particularly interesing to me as their lineup of microcontrollers is extremely popular in low power wireless domains, a common use case for Golioth customers. Naturally, I was eager to look into the details of VPR as more information became available. A basic description of registers and initialization can be found in the nRF54L documentation. VPR is an RV32E processor, meaning that it uses 32-bit registers but implements only the lower 16 registers required by the embedded (E) specification rather than the 32 defined by the full 32-bit integer (RV32I) specification. It also implements multiplication and division operations (M), as well as the compressed instruction (C) extension, which adds support for 16-bit instruction variants to improve code density. All instructions are executed in machine mode (M), which is the only implemented privilege level. Alongside the Arm Cortex-M33 application processor, nRF54L MCUs include a single VPR processor, referred to as the fast lightweight peripheral processor (FLPR, pronounced "flipper"), while the nRF54H20 includes both a FLPR and a peripheral processor (PPR, pronounced "pepper") alongside its dual Arm Cortex-M33 application and network processors. The PPR is meant for peripheral handling with low power consumption and it runs at 16 MHz. The FLPR runs at 320 MHz and is intended for software-defined peripherals. VPR processors must be configured and started by their controlling application processor. This consists of setting the VPR's program counter in the INITPC register, then writing to the CPURUN register to start the processor. On the nRF54H20, the PPR VPR executes code from slow global RAM (RAM3x). Therefore, the controlling processor must copy code into the appropriate region, before updating the program counter and starting the VPR processor. I have previously written about the Zephyr boot process, and specifically about hardware initialization. Zephyr uses devicetree to describe the hardware and peripherals of supported processors and boards. In the nrf54h20.dtsi devicetree include file, there is a node representing the PPR VPR processor. cpuppr: cpu@d { compatible = "nordic,vpr"; reg = <13>; device_type = "cpu"; clocks = <&fll16m>; clock-frequency = ; riscv,isa = "rv32emc"; nordic,bus-width = <32>; cpuppr_vevif_rx: mailbox { compatible = "nordic,nrf-vevif-task-rx"; status = "disabled"; interrupt-parent = <&cpuppr_clic>; interrupts = <0 NRF_DEFAULT_IRQ_PRIORITY>, <1 NRF_DEFAULT_IRQ_PRIORITY>, <2 NRF_DEFAULT_IRQ_PRIORITY>, <3 NRF_DEFAULT_IRQ_PRIORITY>, <4 NRF_DEFAULT_IRQ_PRIORITY>, <5 NRF_DEFAULT_IRQ_PRIORITY>, <6 NRF_DEFAULT_IRQ_PRIORITY>, <7 NRF_DEFAULT_IRQ_PRIORITY>, <8 NRF_DEFAULT_IRQ_PRIORITY>, <9 NRF_DEFAULT_IRQ_PRIORITY>, <10 NRF_DEFAULT_IRQ_PRIORITY>, <11 NRF_DEFAULT_IRQ_PRIORITY>, <12 NRF_DEFAULT_IRQ_PRIORITY>, <13 NRF_DEFAULT_IRQ_PRIORITY>, <14 NRF_DEFAULT_IRQ_PRIORITY>, <15 NRF_DEFAULT_IRQ_PRIORITY>; #mbox-cells = <1>; nordic,tasks = <16>; nordic,tasks-mask = <0xfffffff0>; }; }; Previously mentioned attributes, such as the support for the rv32emc RISC-V ISA, as well as the 16 MHz (DT_FREQ_M(16)) clock speed, are defined. There is also a node under soc that defines the PPR as a coprocessor peripheral. cpuppr_vpr: vpr@908000 { compatible = "nordic,nrf-vpr-coprocessor"; reg = <0x908000 0x1000>; status = "disabled"; #address-cells = <1>; #size-cells = <1>; ranges = <0x0 0x908000 0x1000>; power-domains = <&gpd NRF_GPD_SLOW_ACTIVE>; cpuppr_vevif_tx: mailbox@0 { compatible = "nordic,nrf-vevif-task-tx"; reg = <0x0 0x1000>; status = "disabled"; #mbox-cells = <1>; nordic,tasks = <16>; nordic,tasks-mask = <0xfffffff0>; }; }; The compatible property associates the PPR with the appropriate driver that the controlling processor, in this case the nRF54H20 application processor (cpuapp), can use to configure and start it. In the miscellaneous drivers, there is a nordic_vpr_launcher, which defines its device compatability. #define DT_DRV_COMPAT nordic_nrf_vpr_coprocessor The driver is relatively simple because it is only responsible for device initialization. The initialization function performs the operations described in the documentation. static int nordic_vpr_launcher_init(const struct device *dev) { const struct nordic_vpr_launcher_config *config = dev->config; #if DT_ANY_INST_HAS_PROP_STATUS_OKAY(source_memory) if (config->size > 0U) { LOG_DBG("Loading VPR (%p) from %p to %p (%zu bytes)", config->vpr, (void *)config->src_addr, (void *)config->exec_addr, config->size); memcpy((void *)config->exec_addr, (void *)config->src_addr, config->size); } #endif #if defined(CONFIG_SOC_NRF54L_CPUAPP_COMMON) && !defined(CONFIG_TRUSTED_EXECUTION_NONSECURE) nrf_spu_periph_perm_secattr_set(NRF_SPU00, nrf_address_slave_get((uint32_t)config->vpr), true); #endif LOG_DBG("Launching VPR (%p) from %p", config->vpr, (void *)config->exec_addr); nrf_vpr_initpc_set(config->vpr, config->exec_addr); nrf_vpr_cpurun_set(config->vpr, true); return 0; } The various nrf_vpr_* functions are provided as inline by hal_nordic. The device configuration, which determines the memory source and destination addresses used by the launcher, is instantiated using various DT_* devicetree macros, which are performed for every compatible node using DT_INST_FOREACH_STATUS_OKAY. /* obtain VPR address either from memory or partition */ #define VPR_ADDR(node_id) \ (DT_REG_ADDR(node_id) + \ COND_CODE_0(DT_FIXED_PARTITION_EXISTS(node_id), (0), (DT_REG_ADDR(DT_GPARENT(node_id))))) #define NORDIC_VPR_LAUNCHER_DEFINE(inst) \ IF_ENABLED(DT_INST_NODE_HAS_PROP(inst, source_memory), \ (BUILD_ASSERT((DT_REG_SIZE(DT_INST_PHANDLE(inst, execution_memory)) <= \ DT_REG_SIZE(DT_INST_PHANDLE(inst, source_memory))), \ "Execution memory exceeds source memory size");)) \ \ static const struct nordic_vpr_launcher_config config##inst = { \ .vpr = (NRF_VPR_Type *)DT_INST_REG_ADDR(inst), \ .exec_addr = VPR_ADDR(DT_INST_PHANDLE(inst, execution_memory)), \ IF_ENABLED(DT_INST_NODE_HAS_PROP(inst, source_memory), \ (.src_addr = VPR_ADDR(DT_INST_PHANDLE(inst, source_memory)), \ .size = DT_REG_SIZE(DT_INST_PHANDLE(inst, execution_memory)),))}; \ \ DEVICE_DT_INST_DEFINE(inst, nordic_vpr_launcher_init, NULL, NULL, &config##inst, \ POST_KERNEL, CONFIG_NORDIC_VPR_LAUNCHER_INIT_PRIORITY, NULL); DT_INST_FOREACH_STATUS_OKAY(NORDIC_VPR_LAUNCHER_DEFINE) We can see the initialization in action by building applications for both the application processor and the PPR. Zephyr's sysbuild allows for building for multiple targets. The hello_world sample demonstrates this capability by building the same application, which just prints Hello world from {target}, for all specified targets. west build -p -b nrf54h20dk/nrf54h20/cpuapp -T sample.sysbuild.hello_world.nrf54h20dk_cpuapp_cpuppr . The -T argument specifies one of the sample's test configurations, which will automatically inject extra arguments into the build. sample.sysbuild.hello_world.nrf54h20dk_cpuapp_cpuppr: platform_allow: - nrf54h20dk/nrf54h20/cpuapp integration_platforms: - nrf54h20dk/nrf54h20/cpuapp extra_args: SB_CONF_FILE=sysbuild/nrf54h20dk_nrf54h20_cpuppr.conf hello_world_SNIPPET=nordic-ppr The SB_CONF_FILE specifies the sysbuild configuration. For the nRF24H20 development kit (DK), the configuration file only serves to specify the board target for the remote variant of the application, which is the PPR. SB_CONFIG_REMOTE_BOARD="nrf54h20dk/nrf54h20/cpuppr" The hello_world_SNIPPET specifies a snippet that should be included in the build. It includes two devicetree overlay files. One, nordic-ppr.overlay, enables the cpuppr_vpr node for any boards that incoporate a PPR. &cpuppr_vpr { status = "okay"; }; Observant readers will notice that it appears Nordic's upcoming nRF9280 also includes a PPR. The second, nrf54h20_cpuapp.overlay, enables the memory region that is shared between the application processor and the PPR. &cpuppr_ram3x_region { status = "okay"; }; After build, the application processor image will reside in build/ hello_world, while the PPR image will reside in build/remote. We can use the respective Arm and RISC-V toolchains from the Zephyr SDK to inspect the ELF files and understand how the configuration translates to instructions in the binary. To disassemble the application processor image, use the arm-zephyr-eabi toolchain's objdump. $ZEPHYR_SDK_PATH/arm-zephyr-eabi/bin/arm-zephyr-eabi-objdump -D build/hello_world/zephyr/zephyr.elf For the PPR image, the riscv64-zephyr-elf variant can be used. $ZEPHYR_SDK_PATH/riscv64-zephyr-elf/bin/riscv64-zephyr-elf-objdump -D build/remote/zephyr/zephyr.elf In the previously shown nordic_vpr_launcher driver initialization, the DEVICE_DT_INST_DEFINE macro is used to define the PPR peripheral device with a POST_KERNEL initialization level. This means that the device should be initialized after the kernel boots. This is handled by a call to z_sys_init_run_level with level POST_KERNEL. static void z_sys_init_run_level(enum init_level level) { static const struct init_entry *levels[] = { __init_EARLY_start, __init_PRE_KERNEL_1_start, __init_PRE_KERNEL_2_start, __init_POST_KERNEL_start, __init_APPLICATION_start, #ifdef CONFIG_SMP __init_SMP_start, #endif /* CONFIG_SMP */ /* End marker */ __init_end, }; const struct init_entry *entry; for (entry = levels[level]; entry < levels[level+1]; entry++) { const struct device *dev = entry->dev; int result; sys_trace_sys_init_enter(entry, level); if (dev != NULL) { result = do_device_init(entry); } else { result = entry->init_fn.sys(); } sys_trace_sys_init_exit(entry, level, result); } } We can see the disassembly in the application processor image. 0e0a9ca4 : e0a9ca4: b538 push {r3, r4, r5, lr} e0a9ca6: 4b09 ldr r3, [pc, #36] ; (e0a9ccc ) e0a9ca8: f853 4020 ldr.w r4, [r3, r0, lsl #2] e0a9cac: 3001 adds r0, #1 e0a9cae: f853 5020 ldr.w r5, [r3, r0, lsl #2] e0a9cb2: 42a5 cmp r5, r4 e0a9cb4: d800 bhi.n e0a9cb8 e0a9cb6: bd38 pop {r3, r4, r5, pc} e0a9cb8: 6863 ldr r3, [r4, #4] e0a9cba: b123 cbz r3, e0a9cc6 e0a9cbc: 4620 mov r0, r4 e0a9cbe: f003 f81f bl e0acd00 e0a9cc2: 3408 adds r4, #8 e0a9cc4: e7f5 b.n e0a9cb2 e0a9cc6: 6823 ldr r3, [r4, #0] e0a9cc8: 4798 blx r3 e0a9cca: e7fa b.n e0a9cc2 e0a9ccc: 0e0ae818 mcreq 8, 0, lr, cr10, cr8, {0} The second instruction, ldr r3, [pc, #36] loads the address at a 36 byte offset from the current program counter (pc), which is the last entry in the function disassembly: 0e0ae818. Ignore the attempted instruction decoding by objdump for this address and all stored memory addresses below. This is the address of the levels array, which contains init_entry items. 0e0ae818 : e0ae818: 0e0ad3cc cdpeq 3, 0, cr13, cr10, cr12, {6} e0ae81c: 0e0ad3cc cdpeq 3, 0, cr13, cr10, cr12, {6} e0ae820: 0e0ad40c cdpeq 4, 0, cr13, cr10, cr12, {0} e0ae824: 0e0ad414 cfmvdlreq mvd10, sp e0ae828: 0e0ad464 cdpeq 4, 0, cr13, cr10, cr4, {3} e0ae82c: 0e0ad474 mcreq 4, 0, sp, cr10, cr4, {3} e0ae830: 3566726e strbcc r7, [r6, #-622]! ; 0xfffffd92 e0ae834: 30326834 eorscc r6, r2, r4, lsr r8 e0ae838: 30406b64 subcc r6, r0, r4, ror #22 e0ae83c: 302e392e eorcc r3, lr, lr, lsr #18 e0ae840: 66726e2f ldrbtvs r6, [r2], -pc, lsr #28 e0ae844: 32683435 rsbcc r3, r8, #889192448 ; 0x35000000 e0ae848: 70632f30 rsbvc r2, r3, r0, lsr pc e0ae84c: 70706175 rsbsvc r6, r0, r5, ror r1 e0ae850: 6c654800 stclvs 8, cr4, [r5], #-0 e0ae854: 77206f6c strvc r6, [r0, -ip, ror #30]! e0ae858: 646c726f strbtvs r7, [ip], #-623 ; 0xfffffd91 e0ae85c: 6f726620 svcvs 0x00726620 e0ae860: 7325206d ; instruction: 0x7325206d e0ae864: 6146000a cmpvs r6, sl e0ae868: 64656c69 strbtvs r6, [r5], #-3177 ; 0xfffff397 e0ae86c: 206f7420 rsbcs r7, pc, r0, lsr #8 e0ae870: 6f626572 svcvs 0x00626572 e0ae874: 203a746f eorscs r7, sl, pc, ror #8 e0ae878: 6e697073 mcrvs 0, 3, r7, cr9, cr3, {3} e0ae87c: 676e696e strbvs r6, [lr, -lr, ror #18]! e0ae880: 646e6520 strbtvs r6, [lr], #-1312 ; 0xfffffae0 e0ae884: 7373656c cmnvc r3, #108, 10 ; 0x1b000000 e0ae888: 2e2e796c vnmulcs.f16 s14, s28, s25 ; e0ae88c: 69000a2e stmdbvs r0, {r1, r2, r3, r5, r9, fp} e0ae890: 322d6370 eorcc r6, sp, #112, 6 ; 0xc0000001 e0ae894: 0032312d eorseq r3, r2, sp, lsr #2 e0ae898: 2d637069 stclcs 0, cr7, [r3, #-420]! ; 0xfffffe5c e0ae89c: 00332d32 eorseq r2, r3, r2, lsr sp e0ae8a0: 736d6369 cmnvc sp, #-1543503871 ; 0xa4000001 e0ae8a4: 6f775f67 svcvs 0x00775f67 e0ae8a8: 00716b72 rsbseq r6, r1, r2, ror fp The fourth entry, 0e0ad414, is the init_entry for the PPR peripheral device. At that address, we'll find the address of its init_function (0e0ac7cb) and a pointer to the device structure (0e0ad474). 0e0ad414 <__init___device_dts_ord_60>: e0ad414: 0e0ac7cb cdpeq 7, 0, cr12, cr10, cr11, {6} e0ad418: 0e0ad474 mcreq 4, 0, sp, cr10, cr4, {3} As expected, the init_function address is the location of the nordic_vpr_launcher_init function. Or, almost. There is a one byte offset in the address stored in the init_entry (0e0ac7cb) from the address of the launcher function (0e0ac7ca). In a moment we'll see why. 0e0ac7ca : e0ac7ca: b510 push {r4, lr} e0ac7cc: 6844 ldr r4, [r0, #4] e0ac7ce: 68e2 ldr r2, [r4, #12] e0ac7d0: b11a cbz r2, e0ac7da e0ac7d2: e9d4 0101 ldrd r0, r1, [r4, #4] e0ac7d6: f000 fd56 bl e0ad286 e0ac7da: e9d4 3200 ldrd r3, r2, [r4] e0ac7de: f8c3 2808 str.w r2, [r3, #2056] ; 0x808 e0ac7e2: 2201 movs r2, #1 e0ac7e4: 6823 ldr r3, [r4, #0] e0ac7e6: 2000 movs r0, #0 e0ac7e8: f8c3 2800 str.w r2, [r3, #2048] ; 0x800 e0ac7ec: bd10 pop {r4, pc z_sys_init_run_level calls do_device_init on the PPR peripheral. static int do_device_init(const struct init_entry *entry) { const struct device *dev = entry->dev; int rc = 0; if (entry->init_fn.dev != NULL) { rc = entry->init_fn.dev(dev); /* Mark device initialized. If initialization * failed, record the error condition. */ if (rc != 0) { if (rc < 0) { rc = -rc; } if (rc > UINT8_MAX) { rc = UINT8_MAX; } dev->state->init_res = rc; } } dev->state->initialized = true; if (rc == 0) { /* Run automatic device runtime enablement */ (void)pm_device_runtime_auto_enable(dev); } return rc; } It loads the init_function address and device pointer from the passed init_entry into r3 and r4 respectively (ldrd r3, r4, [r0]). Then, if the device initialization function is not NULL, invokes the function (blx r3) after moving the device pointer into r0 to be passed (mov r0, r4). 0e0acd00 : e0acd00: b510 push {r4, lr} e0acd02: e9d0 3400 ldrd r3, r4, [r0] e0acd06: b933 cbnz r3, e0acd16 e0acd08: 2000 movs r0, #0 e0acd0a: 68e2 ldr r2, [r4, #12] e0acd0c: 7853 ldrb r3, [r2, #1] e0acd0e: f043 0301 orr.w r3, r3, #1 e0acd12: 7053 strb r3, [r2, #1] e0acd14: bd10 pop {r4, pc} e0acd16: 4620 mov r0, r4 e0acd18: 4798 blx r3 e0acd1a: 2800 cmp r0, #0 e0acd1c: d0f4 beq.n e0acd08 e0acd1e: 2800 cmp r0, #0 e0acd20: bfb8 it lt e0acd22: 4240 neglt r0, r0 e0acd24: 28ff cmp r0, #255 ; 0xff e0acd26: bfa8 it ge e0acd28: 20ff movge r0, #255 ; 0xff e0acd2a: 68e3 ldr r3, [r4, #12] e0acd2c: 7018 strb r0, [r3, #0] e0acd2e: e7ec b.n e0acd0a The use of the blx instruction ("Branch with Link and Exchange") is the reason for the 1 byte offset in address. The Cortex-M33 implements the Armv8-M architecture, which uses the T32 (formerly Thumb2) instruction set. Armv8 supports "interworking", which allows dynamically switching between A32 and T32 instruction sets using the least significant bit of the destination address for applicable interworking instructions to indicate the ISA used by the callee. However, Armv8-M only supports T32, so the least significant bit must always be a 1. The Armv8-M Architecture Reference Manual includes the following description for the blx instruction. Bit[0] complies with the Arm architecture interworking rules for switching between the A32 and T32 instruction sets. However, Armv8-M only supports the T32 instruction set, so bit[0] must be 1. If bit[0] is 0 the PE takes an INVSTATE UsageFault exception on the instruction at the target address. In the device_area section of the application processor binary, there is a device structure for the PPR peripheral. Its address matches the second member of the init_entry that was passed to do_device_init (0e0ad474). 0e0ad474 <__device_dts_ord_60>: e0ad474: 0e0ae95e ; instruction: 0x0e0ae95e e0ad478: 0e0ae5e8 cfsh32eq mvfx14, mvfx10, #-8 e0ad47c: 00000000 andeq r0, r0, r0 e0ad480: 2f011404 svccs 0x00011404 e0ad484: 00000000 andeq r0, r0, r0 The second member of the device is a pointer to the config (0e0ae5e8) for the PPR peripheral. 0e0ae5e8 : e0ae5e8: 5f908000 svcpl 0x00908000 e0ae5ec: 2fc00000 svccs 0x00c00000 e0ae5f0: 0e0e4000 cdpeq 0, 0, cr4, cr14, cr0, {0} e0ae5f4: 0000f800 andeq pc, r0, r0, lsl #16 The nordic_vpr_launcher defines the config as follows. struct nordic_vpr_launcher_config { NRF_VPR_Type *vpr; uintptr_t exec_addr; #if DT_ANY_INST_HAS_PROP_STATUS_OKAY(source_memory) uintptr_t src_addr; size_t size; #endif }; In the nRF54H20 memory layout documentation, the slow global RAM (RAM3x) that the PRR executes from has address range from 2fc00000 to 2fc14000. The exec_addr matches the top of the range 2fc00000. The src_addr is 0e0e4000, which resides in the MRAM_10 address range (0e000000 to 0e100000). MRAM_10 is non-volatile memory used for storing firmware images. The VPR launcher copies the PPR firmware from MRAM_10 to RAM3x, sets the PPR program counter to the start address of RAM3x, then starts the PPR. In the dump of the PPR firmware image (now we're looking at RISC-V instructions), we can see that the start address corresponds to the expected __start symbol. 2fc00000 <__start>: 2fc00000: 00001297 auipc t0,0x1 2fc00004: 90028293 addi t0,t0,-1792 # 2fc00900 <_isr_wrapper> 2fc00008: 00328293 addi t0,t0,3 2fc0000c: 30529073 csrw mtvec,t0 2fc00010: 00000297 auipc t0,0x0 2fc00014: 0f028293 addi t0,t0,240 # 2fc00100 <_irq_vector_table> 2fc00018: 30729073 csrw 0x307,t0 2fc0001c: 0a50006f j 2fc008c0 <_vector_end> However, the PPR firmware image needs to be present in the MRAM_10 region before it can be copied. This is handled by sysbuild on west flash as the flash order is defined in the generated domains.yaml. default: hello_world build_dir: zephyr/samples/sysbuild/hello_world/build domains: - name: hello_world build_dir: zephyr/samples/sysbuild/hello_world/build/hello_world - name: remote build_dir: zephyr/samples/sysbuild/hello_world/build/remote flash_order: - remote - hello_world After successful programming, the application processor will boot, initialize and start the PPR, then print to the configured console on /dev/ttyACM0. *** Booting nRF Connect SDK v2.8.0-a2386bfc8401 *** *** Using Zephyr OS v3.7.99-0bc3393fb112 *** Hello world from nrf54h20dk@0.9.0/nrf54h20/cpuapp After being started by the application processor, the PPR will boot and also print to its configured console on /dev/ttyACM1. *** Booting nRF Connect SDK v2.8.0-a2386bfc8401 *** *** Using Zephyr OS v3.7.99-0bc3393fb112 *** Hello world from nrf54h20dk@0.9.0/nrf54h20/cpuppr It is exciting to see the use of RISC-V for domain specific operations alongside the familiar Arm processors present in many microcontrollers. With the incoporation of VPR processors across many of Nordic's new SoCs, it is clear that we'll continue to see more heterogeneous compute resources in the coming years. Understanding the architecture of the system and the interaction between components allows us to more fully leverage the capabilities of these products. As we continue exploring the PPR and FLPR VPR processors, we'll see how they can be used to improve performance and expand functionality. (c) 2024 Daniel Mangum * Powered by Hugo & Coder.