[HN Gopher] VPR: Nordic's First RISC-V Processor
       ___________________________________________________________________
        
       VPR: Nordic's First RISC-V Processor
        
       Author : hasheddan
       Score  : 152 points
       Date   : 2024-12-24 15:50 UTC (2 days ago)
        
 (HTM) web link (danielmangum.com)
 (TXT) w3m dump (danielmangum.com)
        
       | vardump wrote:
       | So many different cores.
       | 
       | Dual M33s, VPR, consisting of PPR (a low power 16 MHz RISC-V
       | RV32E? for external I/O) and FLPR (a RISC-V RV32E running at 320
       | MHz, for software defined peripherals).
       | 
       | I wonder what the max square wave GPIO toggle rate is with FLPR.
       | Can it compete with for example RP2040/2350 PIO?
        
         | crest wrote:
         | They would have to fuck up massively for an I/O co-processor to
         | lack single cycle GPIO access which would at least allow
         | driving a 320/2 50% duty-cycle square wave, but sustaining that
         | deterministically through priority access to memory and DMA is
         | something else. At 150MHz the RP2350 is able to sustain that
         | bandwidth with hard realtime guarantees allowing it to do
         | things which are hard to impossible on other chips in the same
         | price class e.g. glitchfree 720p video output "in software".
        
         | petra wrote:
         | I wonder if that will be the effect of the rp2350, to push the
         | whole market toward software defined peripherals?
        
           | MisterTea wrote:
           | I would say not really as software defined peripherals have
           | been around for some time. There is the Parallax Propeller
           | and Greenarrays chips which have no hard IO blocks,
           | everything is done in CPU cores. The Ti PRU is similar.
        
       | hasheddan wrote:
       | Author here. I've got a few more posts on VPR coming in the next
       | couple of weeks. If you have any requests for deep dives into
       | specific aspects of the architecture, feel free to drop them
       | here!
        
         | SV_BubbleTime wrote:
         | All of this wreaks of complexity crisis to me. That you need to
         | know much and do do so much work - just in order to do the work
         | you want to do.
         | 
         | Explain why I'm wrong, please.
        
           | AlotOfReading wrote:
           | The article goes into more detail than it strictly needs to
           | because the purpose is educational. However, a lot of what
           | it's presenting is simplified interfaces and relevant details
           | rather than the true complexity of the whole.
           | 
           | Modern hardware is just fundamentally complex, especially if
           | you want to make full use of the particular features of each
           | platform.
        
           | fidotron wrote:
           | You are wrong.
           | 
           | When more general purpose hardware (i.e. CPU cores) are added
           | to chips like this it is to replace the need for single
           | purpose devices. True nightmarish complexity comes from
           | enormous numbers of highly specific single purpose devices
           | which all have their own particular oddities.
           | 
           | There was a chip a while back which took this to a crazy
           | extreme but threw out the whole universe in the process
           | https://www.greenarraychips.com/
        
             | awjlogan wrote:
             | Not wrong, especially for microcontrollers where
             | micro/nanosecond determinism may be important - software
             | running on general purpose cores is not suitable for that.
             | They can also be orders of magnitude more energy efficient
             | than running a full core just to twiddle some pins.
             | 
             | I've got a project that uses 4 hardware serial modules,
             | timers, ADC, event system etc all dedicated function. Sure,
             | they have their quirks but once you've learnt them you can
             | reuse a lot of the drivers across multiple products,
             | especially for a given vendor.
             | 
             | Of course there is some cost, but it's finding the balance
             | for your product that is important.
        
               | fidotron wrote:
               | > They can also be orders of magnitude more energy
               | efficient than running a full core just to twiddle some
               | pins.
               | 
               | This used to be true, but as fabrication shrinks first
               | you move to quasi FSMs (like the PIO blocks) and
               | eventually mini processors since those are smaller than
               | the dedicated units of the previous generation. When you
               | get the design a bit wrong you end up with the esp32
               | where the lack of general computation in peripherals
               | radically bumps memory requirements and so the power
               | usage.
               | 
               | This trend also occurs in GPUs where functionality
               | eventually gets merged into more uniform blocks to make
               | room for newly conceived specialist units that have
               | become viable.
        
               | awjlogan wrote:
               | No, still true - you're never going to beat the
               | determinism, size, and power of a few flops and some
               | logic to drive a common interface directly compared to a
               | full core with architectural state and memory. E.g., just
               | to enter an interrupt is 10-15 odd cycles, a memory
               | access or two to set a pin, and then 10-15 cycles again
               | to restore and exit.
               | 
               | Additionally, micros have to be much robust electrically
               | than a cutting edge (or even 14 nm) CPU/GPU and available
               | for extended (decade) timespans so the economics driving
               | the shrink are different.
               | 
               | Small, fast cores have eaten the lunch of e.g. large
               | dedicated DSP blocks for sure but those are niche cases
               | where the volume is low so eventually the hardware cost
               | and cost to develop on weird architectures costs more
               | than running a general purpose core.
        
               | fidotron wrote:
               | > No, still true - you're never going to beat the
               | determinism, size, and power of a few flops and some
               | logic to drive a common interface directly compared to a
               | full core with architectural state and memory.
               | 
               | But you must know what you intend to do when designing
               | the MCU, and history shows (and some of the questioning
               | here also shows) that this isn't the case. As you point
               | out expected lifespans are long, so what is a designer to
               | do?
               | 
               | The ESP32 case is interesting because it comes so close,
               | to the point I believe the RMT peripheral probably partly
               | inspired the PIO, thanks to how widely it has been used
               | for other things and how it breaks.
               | 
               | The key weakness of the RMT is it expects the conversion
               | of the data structures to be used to control it to be
               | prepared in memory already, almost certainly by the CPU.
               | This means that to alter the data being sent out requires
               | the main app processor, the DMA and the peripheral to be
               | involved, and we are hammering the memory bus while doing
               | this.
               | 
               | A similar thing occurs with almost any non trivial SPI
               | usage where a lot of people end up building "big"
               | (relatively) buffers in memory in advance.
               | 
               | Both of those situations are very common and bad.
               | Assuming the tiny cores can have their own program memory
               | they will be no less deterministic than any other sort of
               | peripheral while radically freeing up the central part of
               | the system.
               | 
               | One of the main things I have learned over the years is
               | people wildly overstate the cost of computation and
               | understate the cost of moving data around. If you can
               | reduce the data a lot at the cost of a bit more
               | computation that is a big win.
        
               | awjlogan wrote:
               | > But you must know what you intend to do when designing
               | the MCU, and history shows (and some of the questioning
               | here also shows) that this isn't the case. As you point
               | out expected lifespans are long, so what is a designer to
               | do?
               | 
               | Designers _do_ know that UARTs, SPIs, I2C, timers etc
               | will be around essentially forever. Anything new has to
               | be so much faster /better, the competition being the
               | status quo and its long tail, that you would lay down a
               | dedicated block anyway.
               | 
               | I think we'll disagree, but I'm not convinced by many of
               | the cases given here (usually DVI on an RP2040...) as you
               | would just buy a slightly higher spec and better
               | optimised system that has the interface already built in.
               | Personal opinion: great fun to play with and definitely
               | good to have a couple to handle niche interfaces (e.g.
               | OneWire), but not for majority of use cases.
               | 
               | > A similar thing occurs with almost any non trivial SPI
               | usage where a lot of people end up building "big"
               | (relatively) buffers in memory in advance.
               | 
               | This is neither here nor there for a "PIO" or a fixed
               | function - there has be state and data _somewhere_. I
               | would rather allocate just what is needed for e.g. a UART
               | (on my weapon of choice, that amounts to a heady 40 bits
               | _local_ to the peripheral written once to configure it,
               | overloaded with SPI and I2C functionality) and not
               | trouble the memory bus other than for data (well said on
               | data movement, burns a lot and it 's harder to capture).
               | 
               | > Assuming the tiny cores can have their own program
               | memory they will be no less deterministic than any other
               | sort of peripheral while radically freeing up the central
               | part of the system.
               | 
               | Agreed, only if it's dedicated to a single function of
               | course otherwise you have access contention. And, of
               | course, we already have radically freed up the central
               | part of the system :P
               | 
               | Regardless, enjoyed the conversation, thank you!
        
               | fidotron wrote:
               | > Regardless, enjoyed the conversation, thank you!
               | 
               | Likewise, very much so!
        
               | kragen wrote:
               | If you have a programmable state machine that's waiting
               | for a pin transition, it can easily do the thing it's
               | waiting to do in the clock cycle after that transition.
               | It doesn't have to enter an interrupt handler. That's how
               | the GA144 and the RP2350 do their I/O. Padauk chips have
               | a second hardware thread and deterministically context
               | switch every cycle, so the response latency is still less
               | than 10-15 cycles, like 1-2. I think old ARM FIQ state
               | also effectively works this way, switching register banks
               | on interrupt so no time is needed to save registers on
               | interrupt entry, and I think the original Z80 (RIP this
               | year) also has this feature. Some RISC-V cores
               | (CH32V003?) also have it.
               | 
               | An alternate register bank for the main CPU is bigger
               | than a PWM timer peripheral or an SPI peripheral, sure,
               | but you can program it to do things you didn't think of
               | before tapeout.
        
         | hn3er1q wrote:
         | Thank you so much for asking, I have oh so many requests...
         | 
         | Personally, I'm mostly interested in the ARM vs RISCV compare
         | and contrast.
         | 
         | - I'd be very interested in comparing static memory and ram
         | memory requirements for programs that are as similar as you can
         | make them at the c-level using whatever toolchain Nordic wants
         | you to use.
         | 
         | - Since you're looking to do deep dives I think looking into
         | differences in the interrupt architecture and any implications
         | on stack memory requirements and/or latency would be
         | interesting, especially as VPR is a "peripheral processor"
         | 
         | - It would be interesting to get cycle counts for similar
         | programs between ARM and RISCV. This might not be very
         | comparable though as it seems the ARM architectures are more
         | complex thus we expect a lower CPI from them. Anyway I think
         | CPI numbers would be interesting.
         | 
         | I could go on but I don't want to be greedy. :)
        
         | rwmj wrote:
         | Why did they go with the 64 bit Arm core instead of an RV64
         | core? (Or an alternative question: why go with the 32 bit
         | RISC-V core instead of an Arm M0?)
         | 
         | Does having mixed architectures cause any issues, for example
         | in developer tools or build systems? (I guess not, since
         | already having 32 vs 64 bit cores means you have effectively a
         | "mixed architecture" even if they were both Arm or RISC-V)
         | 
         | What's the RISC-V core derived from (eg. Rocket Chip? Pico?) or
         | is it their own design?
        
           | als0 wrote:
           | > Why did they go with the 64 bit Arm core
           | 
           | ARM Cortex-M33 is a 32-bit core, not 64-bit.
        
           | crest wrote:
           | They haven't gone with a 64Bit ARM core. The ARMv8*M* isn't
           | 64bit unlike ARMv8R and ARMv8A (the nomenclature can get
           | confusing). The differences between ARMv7M (especially with
           | the optional DSP and FPU extension) and ARMv8M mainline are
           | fairly minor unless you go bigger with an M55 or M85 which
           | (optionally) adds the Helium SIMD extension. At he low end
           | ARMv8M baseline adds a few quality of life features over
           | ARMv6M (e.g. the ability to reasonably efficiently load large
           | immediate values without resorting to constant pool). Also
           | the MPU got cleaned up to make it a little less annoying to
           | configure.
        
             | rwmj wrote:
             | Thanks for the clarification. Confusing terminology!
        
             | pm215 wrote:
             | ARMv8A and ARMv8R can both be pure 32 bit as well,
             | incidentally -- e.g. Cortex-A32 and Cortex-R52. v8A added
             | 64 bit, but it didn't take away 32 bit. It's not until v9A
             | that 32 bit at the OS level was removed, and even there
             | it's still allowed to implement 32 bit support for
             | userspace.
        
         | crest wrote:
         | Will open-source developers unable or unwilling to sign an NDA
         | get access to a toolchain to run their own code on the RISC-V
         | co-processors? Is the bus matrix documented somewhere? Does the
         | fast co-processor have access to DMA engines and interrupts?
        
           | janice1999 wrote:
           | FYI Nordic said on their YouTube channel that the RISC-V
           | toolchain that already ships with Zephyr's SDK will support
           | the cores. See around 00:56:32.520 [1]
           | 
           | [1] https://www.youtube.com/watch?v=ef87Gym_D5c
        
             | hasheddan wrote:
             | Indeed. It is used in this post to compile the Zephyr Hello
             | World example for the PPR.
        
       | Scene_Cast2 wrote:
       | I was looking into these recently. The current batch of the
       | nRF54L15's was recalled, so I wonder when mass availability will
       | happen. It looks like an interesting upgrade to the nRF52 though.
       | 
       | The reason why I was looking at it is because I'm trying to hook
       | up a 1kHz sampling ADC while streaming the data over BLE, and I
       | need either a good DMA engine that can trigger on an external pin
       | without a CPU interrupt, or a second core. I went down the dual
       | core route, but I'd love to hear people's experience with DMA on
       | the nRF52's and esp32-h2's and if it's finicky or not, and it's
       | worth investing time into.
        
         | dwnw wrote:
         | DMA will work fine. I remember a time when we were rewarded for
         | making do with what we had rather than putting a handful of CPU
         | cores on a sensor. These do not sound interesting to me,
         | honestly. It sounds more like they can't figure out what they
         | need to make.
        
           | gonzo wrote:
           | That was a time when gates were not as plentiful as they are
           | now. Both the new Nordic SoCs are designed at 22nm.
           | 
           | It can be a challenge as to what to use all the gates for.
        
       | bfrog wrote:
       | I kind of wonder why Nordic bothered sticking with arm cores at
       | all. The competition isnt
        
         | IshKebab wrote:
         | They'll probably replace them but given they had no RISC-V
         | expertise I imagine they made these small coprocessors first in
         | order to gain experience. You can't magically do everything all
         | at once.
        
         | eschneider wrote:
         | Ok, I'm on a project that just picked N54 processors for a set
         | of new designs. W/O arm cores it wouldn't even be considered.
         | The RISC-V cores are useful and will be put to work, but they
         | aren't exactly a selling point.
         | 
         | Why did we end up with N54s? Price, Performance, BLE6 support,
         | PRICE.
        
         | janice1999 wrote:
         | A lot of software stacks are still only optimised for ARM.
         | TrustZone is also far more mature and supported than the RISC-V
         | equivalents.
        
           | bfrog wrote:
           | Is TrustZone really all that useful? I haven't had a need for
           | anything like it myself!
           | 
           | esp-c6 module parts are like $2-3 and already have an fcc
           | cert, low power, and support all the same protocols I
           | believe. Are nrf54 modules really that cheap? Can I program
           | them natively with Rust yet?
        
             | jdietrich wrote:
             | If you care at all about security, then yes, it is
             | tremendously useful.
        
           | explodingwaffle wrote:
           | What would you consider the "RISC-V equivalent" of TrustZone?
           | Last time I was curious I didn't find anything.
           | 
           | (FWIW I agree with the other commenter that these
           | ""security"" features are useless, and feel to me more like
           | check-box compliance than anything else (Why does TrustZone
           | work with function calls? What's wrong with IPC! Also, what's
           | wrong with privileged mode?). Just seems like a bit of a
           | waste of silicon really.)
        
             | IshKebab wrote:
             | There are some examples listed here:
             | 
             | https://en.wikipedia.org/wiki/Trusted_execution_environment
        
             | jdietrich wrote:
             | MultiZone, OpenMZ, Keystone, maybe Penglai or ProvenCore, I
             | can't really keep up. That answer goes a long way to
             | explaining the appeal of TrustZone.
        
       ___________________________________________________________________
       (page generated 2024-12-26 23:01 UTC)