[HN Gopher] VPR: Nordic's First RISC-V Processor
___________________________________________________________________
VPR: Nordic's First RISC-V Processor
Author : hasheddan
Score : 152 points
Date : 2024-12-24 15:50 UTC (2 days ago)
(HTM) web link (danielmangum.com)
(TXT) w3m dump (danielmangum.com)
| vardump wrote:
| So many different cores.
|
| Dual M33s, VPR, consisting of PPR (a low power 16 MHz RISC-V
| RV32E? for external I/O) and FLPR (a RISC-V RV32E running at 320
| MHz, for software defined peripherals).
|
| I wonder what the max square wave GPIO toggle rate is with FLPR.
| Can it compete with for example RP2040/2350 PIO?
| crest wrote:
| They would have to fuck up massively for an I/O co-processor to
| lack single cycle GPIO access which would at least allow
| driving a 320/2 50% duty-cycle square wave, but sustaining that
| deterministically through priority access to memory and DMA is
| something else. At 150MHz the RP2350 is able to sustain that
| bandwidth with hard realtime guarantees allowing it to do
| things which are hard to impossible on other chips in the same
| price class e.g. glitchfree 720p video output "in software".
| petra wrote:
| I wonder if that will be the effect of the rp2350, to push the
| whole market toward software defined peripherals?
| MisterTea wrote:
| I would say not really as software defined peripherals have
| been around for some time. There is the Parallax Propeller
| and Greenarrays chips which have no hard IO blocks,
| everything is done in CPU cores. The Ti PRU is similar.
| hasheddan wrote:
| Author here. I've got a few more posts on VPR coming in the next
| couple of weeks. If you have any requests for deep dives into
| specific aspects of the architecture, feel free to drop them
| here!
| SV_BubbleTime wrote:
| All of this wreaks of complexity crisis to me. That you need to
| know much and do do so much work - just in order to do the work
| you want to do.
|
| Explain why I'm wrong, please.
| AlotOfReading wrote:
| The article goes into more detail than it strictly needs to
| because the purpose is educational. However, a lot of what
| it's presenting is simplified interfaces and relevant details
| rather than the true complexity of the whole.
|
| Modern hardware is just fundamentally complex, especially if
| you want to make full use of the particular features of each
| platform.
| fidotron wrote:
| You are wrong.
|
| When more general purpose hardware (i.e. CPU cores) are added
| to chips like this it is to replace the need for single
| purpose devices. True nightmarish complexity comes from
| enormous numbers of highly specific single purpose devices
| which all have their own particular oddities.
|
| There was a chip a while back which took this to a crazy
| extreme but threw out the whole universe in the process
| https://www.greenarraychips.com/
| awjlogan wrote:
| Not wrong, especially for microcontrollers where
| micro/nanosecond determinism may be important - software
| running on general purpose cores is not suitable for that.
| They can also be orders of magnitude more energy efficient
| than running a full core just to twiddle some pins.
|
| I've got a project that uses 4 hardware serial modules,
| timers, ADC, event system etc all dedicated function. Sure,
| they have their quirks but once you've learnt them you can
| reuse a lot of the drivers across multiple products,
| especially for a given vendor.
|
| Of course there is some cost, but it's finding the balance
| for your product that is important.
| fidotron wrote:
| > They can also be orders of magnitude more energy
| efficient than running a full core just to twiddle some
| pins.
|
| This used to be true, but as fabrication shrinks first
| you move to quasi FSMs (like the PIO blocks) and
| eventually mini processors since those are smaller than
| the dedicated units of the previous generation. When you
| get the design a bit wrong you end up with the esp32
| where the lack of general computation in peripherals
| radically bumps memory requirements and so the power
| usage.
|
| This trend also occurs in GPUs where functionality
| eventually gets merged into more uniform blocks to make
| room for newly conceived specialist units that have
| become viable.
| awjlogan wrote:
| No, still true - you're never going to beat the
| determinism, size, and power of a few flops and some
| logic to drive a common interface directly compared to a
| full core with architectural state and memory. E.g., just
| to enter an interrupt is 10-15 odd cycles, a memory
| access or two to set a pin, and then 10-15 cycles again
| to restore and exit.
|
| Additionally, micros have to be much robust electrically
| than a cutting edge (or even 14 nm) CPU/GPU and available
| for extended (decade) timespans so the economics driving
| the shrink are different.
|
| Small, fast cores have eaten the lunch of e.g. large
| dedicated DSP blocks for sure but those are niche cases
| where the volume is low so eventually the hardware cost
| and cost to develop on weird architectures costs more
| than running a general purpose core.
| fidotron wrote:
| > No, still true - you're never going to beat the
| determinism, size, and power of a few flops and some
| logic to drive a common interface directly compared to a
| full core with architectural state and memory.
|
| But you must know what you intend to do when designing
| the MCU, and history shows (and some of the questioning
| here also shows) that this isn't the case. As you point
| out expected lifespans are long, so what is a designer to
| do?
|
| The ESP32 case is interesting because it comes so close,
| to the point I believe the RMT peripheral probably partly
| inspired the PIO, thanks to how widely it has been used
| for other things and how it breaks.
|
| The key weakness of the RMT is it expects the conversion
| of the data structures to be used to control it to be
| prepared in memory already, almost certainly by the CPU.
| This means that to alter the data being sent out requires
| the main app processor, the DMA and the peripheral to be
| involved, and we are hammering the memory bus while doing
| this.
|
| A similar thing occurs with almost any non trivial SPI
| usage where a lot of people end up building "big"
| (relatively) buffers in memory in advance.
|
| Both of those situations are very common and bad.
| Assuming the tiny cores can have their own program memory
| they will be no less deterministic than any other sort of
| peripheral while radically freeing up the central part of
| the system.
|
| One of the main things I have learned over the years is
| people wildly overstate the cost of computation and
| understate the cost of moving data around. If you can
| reduce the data a lot at the cost of a bit more
| computation that is a big win.
| awjlogan wrote:
| > But you must know what you intend to do when designing
| the MCU, and history shows (and some of the questioning
| here also shows) that this isn't the case. As you point
| out expected lifespans are long, so what is a designer to
| do?
|
| Designers _do_ know that UARTs, SPIs, I2C, timers etc
| will be around essentially forever. Anything new has to
| be so much faster /better, the competition being the
| status quo and its long tail, that you would lay down a
| dedicated block anyway.
|
| I think we'll disagree, but I'm not convinced by many of
| the cases given here (usually DVI on an RP2040...) as you
| would just buy a slightly higher spec and better
| optimised system that has the interface already built in.
| Personal opinion: great fun to play with and definitely
| good to have a couple to handle niche interfaces (e.g.
| OneWire), but not for majority of use cases.
|
| > A similar thing occurs with almost any non trivial SPI
| usage where a lot of people end up building "big"
| (relatively) buffers in memory in advance.
|
| This is neither here nor there for a "PIO" or a fixed
| function - there has be state and data _somewhere_. I
| would rather allocate just what is needed for e.g. a UART
| (on my weapon of choice, that amounts to a heady 40 bits
| _local_ to the peripheral written once to configure it,
| overloaded with SPI and I2C functionality) and not
| trouble the memory bus other than for data (well said on
| data movement, burns a lot and it 's harder to capture).
|
| > Assuming the tiny cores can have their own program
| memory they will be no less deterministic than any other
| sort of peripheral while radically freeing up the central
| part of the system.
|
| Agreed, only if it's dedicated to a single function of
| course otherwise you have access contention. And, of
| course, we already have radically freed up the central
| part of the system :P
|
| Regardless, enjoyed the conversation, thank you!
| fidotron wrote:
| > Regardless, enjoyed the conversation, thank you!
|
| Likewise, very much so!
| kragen wrote:
| If you have a programmable state machine that's waiting
| for a pin transition, it can easily do the thing it's
| waiting to do in the clock cycle after that transition.
| It doesn't have to enter an interrupt handler. That's how
| the GA144 and the RP2350 do their I/O. Padauk chips have
| a second hardware thread and deterministically context
| switch every cycle, so the response latency is still less
| than 10-15 cycles, like 1-2. I think old ARM FIQ state
| also effectively works this way, switching register banks
| on interrupt so no time is needed to save registers on
| interrupt entry, and I think the original Z80 (RIP this
| year) also has this feature. Some RISC-V cores
| (CH32V003?) also have it.
|
| An alternate register bank for the main CPU is bigger
| than a PWM timer peripheral or an SPI peripheral, sure,
| but you can program it to do things you didn't think of
| before tapeout.
| hn3er1q wrote:
| Thank you so much for asking, I have oh so many requests...
|
| Personally, I'm mostly interested in the ARM vs RISCV compare
| and contrast.
|
| - I'd be very interested in comparing static memory and ram
| memory requirements for programs that are as similar as you can
| make them at the c-level using whatever toolchain Nordic wants
| you to use.
|
| - Since you're looking to do deep dives I think looking into
| differences in the interrupt architecture and any implications
| on stack memory requirements and/or latency would be
| interesting, especially as VPR is a "peripheral processor"
|
| - It would be interesting to get cycle counts for similar
| programs between ARM and RISCV. This might not be very
| comparable though as it seems the ARM architectures are more
| complex thus we expect a lower CPI from them. Anyway I think
| CPI numbers would be interesting.
|
| I could go on but I don't want to be greedy. :)
| rwmj wrote:
| Why did they go with the 64 bit Arm core instead of an RV64
| core? (Or an alternative question: why go with the 32 bit
| RISC-V core instead of an Arm M0?)
|
| Does having mixed architectures cause any issues, for example
| in developer tools or build systems? (I guess not, since
| already having 32 vs 64 bit cores means you have effectively a
| "mixed architecture" even if they were both Arm or RISC-V)
|
| What's the RISC-V core derived from (eg. Rocket Chip? Pico?) or
| is it their own design?
| als0 wrote:
| > Why did they go with the 64 bit Arm core
|
| ARM Cortex-M33 is a 32-bit core, not 64-bit.
| crest wrote:
| They haven't gone with a 64Bit ARM core. The ARMv8*M* isn't
| 64bit unlike ARMv8R and ARMv8A (the nomenclature can get
| confusing). The differences between ARMv7M (especially with
| the optional DSP and FPU extension) and ARMv8M mainline are
| fairly minor unless you go bigger with an M55 or M85 which
| (optionally) adds the Helium SIMD extension. At he low end
| ARMv8M baseline adds a few quality of life features over
| ARMv6M (e.g. the ability to reasonably efficiently load large
| immediate values without resorting to constant pool). Also
| the MPU got cleaned up to make it a little less annoying to
| configure.
| rwmj wrote:
| Thanks for the clarification. Confusing terminology!
| pm215 wrote:
| ARMv8A and ARMv8R can both be pure 32 bit as well,
| incidentally -- e.g. Cortex-A32 and Cortex-R52. v8A added
| 64 bit, but it didn't take away 32 bit. It's not until v9A
| that 32 bit at the OS level was removed, and even there
| it's still allowed to implement 32 bit support for
| userspace.
| crest wrote:
| Will open-source developers unable or unwilling to sign an NDA
| get access to a toolchain to run their own code on the RISC-V
| co-processors? Is the bus matrix documented somewhere? Does the
| fast co-processor have access to DMA engines and interrupts?
| janice1999 wrote:
| FYI Nordic said on their YouTube channel that the RISC-V
| toolchain that already ships with Zephyr's SDK will support
| the cores. See around 00:56:32.520 [1]
|
| [1] https://www.youtube.com/watch?v=ef87Gym_D5c
| hasheddan wrote:
| Indeed. It is used in this post to compile the Zephyr Hello
| World example for the PPR.
| Scene_Cast2 wrote:
| I was looking into these recently. The current batch of the
| nRF54L15's was recalled, so I wonder when mass availability will
| happen. It looks like an interesting upgrade to the nRF52 though.
|
| The reason why I was looking at it is because I'm trying to hook
| up a 1kHz sampling ADC while streaming the data over BLE, and I
| need either a good DMA engine that can trigger on an external pin
| without a CPU interrupt, or a second core. I went down the dual
| core route, but I'd love to hear people's experience with DMA on
| the nRF52's and esp32-h2's and if it's finicky or not, and it's
| worth investing time into.
| dwnw wrote:
| DMA will work fine. I remember a time when we were rewarded for
| making do with what we had rather than putting a handful of CPU
| cores on a sensor. These do not sound interesting to me,
| honestly. It sounds more like they can't figure out what they
| need to make.
| gonzo wrote:
| That was a time when gates were not as plentiful as they are
| now. Both the new Nordic SoCs are designed at 22nm.
|
| It can be a challenge as to what to use all the gates for.
| bfrog wrote:
| I kind of wonder why Nordic bothered sticking with arm cores at
| all. The competition isnt
| IshKebab wrote:
| They'll probably replace them but given they had no RISC-V
| expertise I imagine they made these small coprocessors first in
| order to gain experience. You can't magically do everything all
| at once.
| eschneider wrote:
| Ok, I'm on a project that just picked N54 processors for a set
| of new designs. W/O arm cores it wouldn't even be considered.
| The RISC-V cores are useful and will be put to work, but they
| aren't exactly a selling point.
|
| Why did we end up with N54s? Price, Performance, BLE6 support,
| PRICE.
| janice1999 wrote:
| A lot of software stacks are still only optimised for ARM.
| TrustZone is also far more mature and supported than the RISC-V
| equivalents.
| bfrog wrote:
| Is TrustZone really all that useful? I haven't had a need for
| anything like it myself!
|
| esp-c6 module parts are like $2-3 and already have an fcc
| cert, low power, and support all the same protocols I
| believe. Are nrf54 modules really that cheap? Can I program
| them natively with Rust yet?
| jdietrich wrote:
| If you care at all about security, then yes, it is
| tremendously useful.
| explodingwaffle wrote:
| What would you consider the "RISC-V equivalent" of TrustZone?
| Last time I was curious I didn't find anything.
|
| (FWIW I agree with the other commenter that these
| ""security"" features are useless, and feel to me more like
| check-box compliance than anything else (Why does TrustZone
| work with function calls? What's wrong with IPC! Also, what's
| wrong with privileged mode?). Just seems like a bit of a
| waste of silicon really.)
| IshKebab wrote:
| There are some examples listed here:
|
| https://en.wikipedia.org/wiki/Trusted_execution_environment
| jdietrich wrote:
| MultiZone, OpenMZ, Keystone, maybe Penglai or ProvenCore, I
| can't really keep up. That answer goes a long way to
| explaining the appeal of TrustZone.
___________________________________________________________________
(page generated 2024-12-26 23:01 UTC)