[HN Gopher] ESP32-S3 has a few SIMD instructions
       ___________________________________________________________________
        
       ESP32-S3 has a few SIMD instructions
        
       Author : _Microft
       Score  : 181 points
       Date   : 2024-05-05 16:21 UTC (1 days ago)
        
 (HTM) web link (bitbanksoftware.blogspot.com)
 (TXT) w3m dump (bitbanksoftware.blogspot.com)
        
       | adolph wrote:
       | _The Xtensa processor comes from Cadence and for some reason they
       | like to keep everything under NDA, even information which would
       | help people use their processors. I find it hard to understand
       | why the instruction set should be kept secret; a CPU vendor
       | should make it as easy as possible for engineers to use their
       | CPUs._
       | 
       | The P4 can't come soon enough to get off Xtensa.
        
         | jsheard wrote:
         | The P4 doesn't have a built-in radio though, so if you want
         | those beefy RISC-V cores you will need to integrate a second
         | ESP32 just to handle WiFi/BT :(
         | 
         | It will have USB-OTG and an LCD driver at least, which so far
         | have been missing from all of their RISC-V parts.
        
           | ComputerGuru wrote:
           | Cost optimization aside, that has always been the best way to
           | use an ESP chip. Just go with one of their barebone models
           | wired up as a peripheral to an ARM or RISC mcu.
        
             | clbrmbr wrote:
             | Yet it's possible to build some incredible applications on
             | top of just ESP32, especially with extra RAM.
        
             | devmunchies wrote:
             | We use esp32-s3 at my company (smart speaker) but we don't
             | don anything fancy.
             | 
             | Can you explain this? Why use esp as a peripheral if you
             | already have an ARM chip?
             | 
             | We were considering moving off of esp to something that
             | would make it easier do cpu-bound AI inference on-device or
             | to enable more advanced audio DSP algos.
        
               | throwup238 wrote:
               | Based on cost and development time, it's usually just
               | easier to add an ESP and communicate to it using a
               | generic SPI library or something than to add a radio to
               | your PCB and get vendor libraries working on an arbitrary
               | platform.
        
           | timschmidt wrote:
           | Seems likely they'll continue releasing more models, further
           | integrating the features of the P4 and C6 for example. Maybe
           | we'll even get some risc-v SIMD instructions and support for
           | off-chip SRAM.
        
           | antoniuschan99 wrote:
           | I think that's totally fine. Might actually be the future
           | direction. But yea would've been nice to have wifi/bt
           | integrated.
           | 
           | Eg if you want 5ghz then use c5, or if you want some wifi-6
           | so c6, etc.
           | 
           | Also here's a talk by them on how to use esp as a wifi
           | coprocessor https://youtu.be/g14aEjnjRLw?si=TgkEyJJ2_L_Shuom
           | 
           | There's also adafruit airlift
           | https://www.adafruit.com/product/4201
        
             | Aurornis wrote:
             | > I think that's totally fine. Might actually be the future
             | direction.
             | 
             | Using a second board just for WiFi is definitely not
             | totally fine for most applications. Having everything
             | integrated into a single package important for everything
             | from reducing BOM count to lower power consumption to
             | development simplicity.
        
           | yau8edq12i wrote:
           | Without the builtin radio it's really hard to justify the use
           | of an ESP32 over, say, an STM32. The integrated small package
           | with "everything" to make a fun project is the whole appeal.
        
             | AlotOfReading wrote:
             | Espressif has a huge advantage in lead times over ST
             | recently. I migrated a few projects over because ST
             | couldn't or wouldn't give us supply in under a month when
             | you could buy ESP chips and have them on your doorstep
             | practically overnight.
        
               | vbezhenar wrote:
               | Did you look at chinese STM clones? We used gd32, I liked
               | it.
        
               | 6SixTy wrote:
               | Wide product range. ARM and RISC-V all called GD32 with
               | an extra letter for the exact line.
        
               | makapuf wrote:
               | Depends if you want to condone IP theft (compatible
               | independent developments is of course different but I'm
               | not sure this is the case). R&D, Support, good
               | documentation in English and accuracy of specs come at a
               | price.
        
               | pantalaimon wrote:
               | Is implementing the same peripheral register API really
               | IP theft?
        
               | makapuf wrote:
               | No, this was my remark about being compatible.
        
               | vbezhenar wrote:
               | Espressif is Chinese company, just like gigadevices.
               | 
               | I don't think they do IP theft from STM32 (if it's even
               | possible at current node size). They have very thorough
               | datasheets different from STM32 ones and their own SDK
               | with unique code (although it seems to be inspired by
               | STM32 libraries, but absolutely not theft).
        
               | yau8edq12i wrote:
               | Nobody said that espressif steals IP from ST.
        
             | the__alchemist wrote:
             | That's a big differentiator - it's surprising that there is
             | no STM32 with Wi-Fi.
        
             | mort96 wrote:
             | That's the whole appeal to hobbyists, sure. But I'm
             | guessing Espressif wants to be considered for more serious
             | applications as well. Currently, there are good reasons to
             | choose, say, an STM32 over an ESP32 for a commercial
             | product if you don't need RF (or if RF is handled by
             | another part of the product, such as a SoM running Linux).
             | I'm guessing they wanna change that.
        
           | sitkack wrote:
           | The ESP8684H2 is 1.20 qty 1, more than enough to handle BT an
           | Wifi, then you can use any MCU you want as your application
           | processor.
        
         | ajross wrote:
         | FWIW this bit in the article is a little confused. The SIMD
         | instructions being detailed appear to be Espressif-custom
         | things implemented using Cadence's "TIE" facility.
         | 
         | Cadence does indeed have their own SIMD architecture ("HiFi",
         | really it's a family of similar but binary-incompatible ISAs).
         | And indeed docs for that don't appear in public (though if you
         | look carefully, details for how to emit the instructions are
         | part of the GNU toolchain integration).
         | 
         | But that isn't this. If you want docs for this, talk to
         | Espressif, not Cadence.
        
         | mianos wrote:
         | It is possible there is some licensing issue around the SIMD,
         | after all it is an optional component. It was available for the
         | LX6 as well, but not included. It's been a good run but it's
         | great the are going to the RISCV, at the very least for the
         | vibe. I have used both architectures, more recently using their
         | esp-idf and it is surprisingly uneventful to switch between
         | them. The only issue I had is the different high/low speed
         | timer devices between chips. In fact it is a surprise the on
         | chip peripheral hardware is incredibly compatible with their
         | idf. Sure, they have a layer for some calls but a lot is just
         | issuing commands to io devices directly, and the same between
         | riskv and tensilkica cores.
        
         | ajb wrote:
         | Xtensa is an unusual beast because its USP (at least, back when
         | it was owned by tensilica) was that you could easily add
         | extensions. Not just off the shelf ones - ones you defined
         | yourself. They had some automation that would generate a
         | toolchain for you to use with your shiny new instructions. Most
         | CPU architectures exist to allow programs written on once
         | implementation to work on another, with Xtensa it's kind of the
         | opposite - it exists to allow each chip to have its own special
         | sauce.
         | 
         | Honestly I was a bit surprised that espressif used it without
         | defining their own extension of some kind, if you're not doing
         | that then you might as well use something better known.
         | 
         | Edit: ajross* points out that this SIMD extension is such a
         | one, not an off the shelf one. So I guess that explains it.
         | 
         | * https://news.ycombinator.com/item?id=40267977
        
       | lunfard000 wrote:
       | Was it a secret? You could have guessed that something advertised
       | [0] for "AI" had some kind of SIMD. Even ChatGPT 3.5 can give
       | relevant code to use "AI" features [1].
       | 
       | 0: https://www.espressif.com/en/products/socs/esp32-s3
       | 
       | 1: https://chat.openai.com/share/3e1f990d-e8eb-4e56-acbb-
       | ad5a33...
        
         | iamflimflam1 wrote:
         | Not a secret - just not documented very well if at all.
         | 
         | We all knew there were SIMD instructions, but if there's no
         | information on how to use them or what they do...
        
           | lunfard000 wrote:
           | And the author is not documenting them either, just
           | announcing his new niche library. It is not like
           | disassembling a few functions to prove that they exist is
           | dark magic. I just don't see any value in the article.
        
             | iamflimflam1 wrote:
             | I'm not sure I'd call a JPEG decoding library "niche".
             | 
             | There are some numbers here on the performance improvements
             | he's managed to make.
             | 
             | https://atomic14.substack.com/p/even-faster-jpeg-decoding
        
             | bitbank wrote:
             | You need to go back and read it again. I provide links to
             | the relevant Espressif documents and in my next article I
             | provide a simple example to get started. Would you rather
             | have me copy the hundreds of pages of PDF into my blog post
             | instead of providing a link?
        
           | bobmcnamara wrote:
           | IIRC, they have 128bit alignment requirements, so tricky to
           | autovectorize.
        
             | bitbank wrote:
             | True - load and store mask off the bottom 4 bits of the
             | address. They try to help the situation by including an
             | instruction which can shift a pair of 128-bit registers by
             | bytes.
        
         | relaxing wrote:
         | I love doing engineering based off of advertising material...
        
         | amelius wrote:
         | > Even ChatGPT 3.5 can give relevant code to use "AI" features
         | 
         | I've seen ChatGPT invent its own functions and commands ...
        
           | exe34 wrote:
           | if problem: solve_problem()
           | 
           | There, problem solved!
        
             | ssl-3 wrote:
             | # rest of problem-solving code goes here
        
           | makapuf wrote:
           | I've also definitely seen it reference invented methods on
           | APIs (that would have been very nice if they existed) - that
           | no past or future version implemented.
        
       | tzmlab wrote:
       | There's also a follow-up blog post "ESP32-S3 SIMD Minimal
       | Example" [0].
       | 
       | 0:
       | https://bitbanksoftware.blogspot.com/2024/01/esp32-s3-simd-m...
        
       | londons_explore wrote:
       | ESP_Sprite, former opensource-projects-guy, now Espressif
       | employee, is the best source of knowledge on this stuff.
       | 
       | Looks like back in 2021 they had an intention to document these,
       | but never quite got round to it:
       | 
       | https://esp32.com/viewtopic.php?p=88114&sid=f7f25776d9cfc6b6...
       | 
       | They do publish a bunch of opensource code that uses the SIMD
       | stuff, and an assembler, so it isn't secret, just very badly
       | documented.
        
         | londons_explore wrote:
         | Upon further inspection, it now seems like it is much better
         | documented...
         | 
         | Page 37-301 of the reference manual seems to have all you'd
         | need, including binary instruction encodings, details on
         | instruction timings, etc.
         | 
         | https://www.espressif.com/sites/default/files/documentation/...
        
           | jononor wrote:
           | This looks pretty decent! And then esp-dsp can be used as
           | example code in some cases. Personally I want to have
           | accelerated Neural Networks for ESP32-S3 using this. Ideally
           | for https://github.com/sipeed/TinyMaix which is one of the
           | smallest and most hackable CNN implementations.
        
       | londons_explore wrote:
       | I had plans to use this SIMD support for some DSP algorithms on
       | camera video feeds.... But looking at how badly documented it is,
       | I may reconsider...
       | 
       | Without scatter/gather I don't think I'm gonna be able to meet my
       | timing requirements (I need to distort images through warping,
       | which is tricky to do without scatter/gather)
        
       | amelius wrote:
       | Where is a good overview of the various ESP32 chips available and
       | their features?
        
         | mort96 wrote:
         | Espressif has a pretty decent overview on their website:
         | https://www.espressif.com/en/products/socs
         | 
         | They also make a set of modules per chip, so you can get a
         | particular chip in an easier to use package with e.g a built-in
         | PCB antenna or antenna mounting ports or no antenna, various
         | onboard flash sizes, that sort of stuff:
         | https://www.espressif.com/en/products/modules
        
         | the__alchemist wrote:
         | S3 if you want more pins and fast
         | 
         | C3 if you don't, and are OK with RISC-V
         | 
         | PICO 3v2 otherwise.
        
           | tbyehl wrote:
           | S3 also has USB support that I've come to hugely appreciate
           | on dev boards... tho I just got some oddball single-port
           | boards that used a CH340 anyways. Grrr.
        
             | 15155 wrote:
             | S3 does not have real USB support (in the same way that any
             | STM32 with "USB support" does) - it has a USB-UART/JTAG
             | device that you cannot redefine built-in.
             | 
             | (edit: apparently it might? I can't find the documents on
             | how to actually use the OTG device over the fixed IP)
        
               | bitbank wrote:
               | I think you're thinking of the ESP32-C3. The S3 does have
               | a fully programmable USB port that can do things like
               | HID, mass storage, etc.
        
               | the__alchemist wrote:
               | I'm pretty confused by all this! Ie why they are set up
               | like this. Example: All STM32 dev boards have two USB
               | ports: One connected to the USB periph; one to a built-in
               | ST-Link (JTAG). You flash and debug/print to CLI off
               | JTAG, and use the USB one if your device needs to
               | communicate with a PC etc during operation, or if you
               | want to use DFU flashing for production boards or
               | firmware updates.
               | 
               | If you are designing a board, you will probably always
               | have the JTAG pins broken out to a port of your choice,
               | and use an external debugger. Wire USB A/R.
               | 
               | The USB dev boards for C3 seem to all have only a UART
               | bridge USB, and no JTAG! That is confusing because A: I'm
               | not sure if this is a full-up USB peripheral for use as
               | serial (But maybe not HID? Maybe it presents as USB-
               | serial to the PC, but you program it on the MCU like
               | UART?), and B: Why JTAG isn't table-stakes for a dev
               | board.
               | 
               | I ended up buying a "C3-Rust" devboard, because it was
               | the only one I found that had JTAG USB! (I am
               | coincidentally using Rust, but that's only superficially
               | relevant)
        
               | tbyehl wrote:
               | C3 boards are a mess, almost all of them are single-port
               | with a USB-UART chip despite supporting USB-CDC & JTAG.
               | Lots of C6 boards are dual-port but sometimes they're
               | single-port, typically without a USB-UART chip. And to
               | make it more confusing, sometimes a single-port C6 board
               | will have a USB-UART chip that's hanging off a USB hub
               | chip.
               | 
               | S3 has full USB-OTG support and most boards are either
               | dual-port or single-port without a USB-UART chip. I dig
               | 'em because I can put the TinyUF2 bootloader on them and
               | get that Pi Pico experience of having them come up as
               | mass storage.
               | 
               | Except for these S3 boards I bought in Arduino Uno format
               | where the designer made every decisions as wrongly as
               | possible -- single USB, USB-UART chip, and the USB pins
               | broken out to Arduino pins instead of using one of the
               | optional headers they added.
        
       | hrydgard wrote:
       | Nice!
       | 
       | Could save a couple of cycles per iteration by preloading the
       | shift amounts into several GPRs before entering the loop, instead
       | of initializing them just before use.
        
       | DeathArrow wrote:
       | What about ESP32-C3, using RISC-V architecture, does it also have
       | SIMD instructions?
        
         | guntars wrote:
         | It does not. There are RISC-V chips out there (T-Head 906) that
         | have a pre-1.0 vector extension, but these are 64-bit
         | application processors. I'm sure we'll see ESP32 RISC-V chips
         | with SIMD in the next few years.
        
       ___________________________________________________________________
       (page generated 2024-05-06 23:01 UTC)