[HN Gopher] ESP32-S3 has a few SIMD instructions
___________________________________________________________________
ESP32-S3 has a few SIMD instructions
Author : _Microft
Score : 181 points
Date : 2024-05-05 16:21 UTC (1 days ago)
(HTM) web link (bitbanksoftware.blogspot.com)
(TXT) w3m dump (bitbanksoftware.blogspot.com)
| adolph wrote:
| _The Xtensa processor comes from Cadence and for some reason they
| like to keep everything under NDA, even information which would
| help people use their processors. I find it hard to understand
| why the instruction set should be kept secret; a CPU vendor
| should make it as easy as possible for engineers to use their
| CPUs._
|
| The P4 can't come soon enough to get off Xtensa.
| jsheard wrote:
| The P4 doesn't have a built-in radio though, so if you want
| those beefy RISC-V cores you will need to integrate a second
| ESP32 just to handle WiFi/BT :(
|
| It will have USB-OTG and an LCD driver at least, which so far
| have been missing from all of their RISC-V parts.
| ComputerGuru wrote:
| Cost optimization aside, that has always been the best way to
| use an ESP chip. Just go with one of their barebone models
| wired up as a peripheral to an ARM or RISC mcu.
| clbrmbr wrote:
| Yet it's possible to build some incredible applications on
| top of just ESP32, especially with extra RAM.
| devmunchies wrote:
| We use esp32-s3 at my company (smart speaker) but we don't
| don anything fancy.
|
| Can you explain this? Why use esp as a peripheral if you
| already have an ARM chip?
|
| We were considering moving off of esp to something that
| would make it easier do cpu-bound AI inference on-device or
| to enable more advanced audio DSP algos.
| throwup238 wrote:
| Based on cost and development time, it's usually just
| easier to add an ESP and communicate to it using a
| generic SPI library or something than to add a radio to
| your PCB and get vendor libraries working on an arbitrary
| platform.
| timschmidt wrote:
| Seems likely they'll continue releasing more models, further
| integrating the features of the P4 and C6 for example. Maybe
| we'll even get some risc-v SIMD instructions and support for
| off-chip SRAM.
| antoniuschan99 wrote:
| I think that's totally fine. Might actually be the future
| direction. But yea would've been nice to have wifi/bt
| integrated.
|
| Eg if you want 5ghz then use c5, or if you want some wifi-6
| so c6, etc.
|
| Also here's a talk by them on how to use esp as a wifi
| coprocessor https://youtu.be/g14aEjnjRLw?si=TgkEyJJ2_L_Shuom
|
| There's also adafruit airlift
| https://www.adafruit.com/product/4201
| Aurornis wrote:
| > I think that's totally fine. Might actually be the future
| direction.
|
| Using a second board just for WiFi is definitely not
| totally fine for most applications. Having everything
| integrated into a single package important for everything
| from reducing BOM count to lower power consumption to
| development simplicity.
| yau8edq12i wrote:
| Without the builtin radio it's really hard to justify the use
| of an ESP32 over, say, an STM32. The integrated small package
| with "everything" to make a fun project is the whole appeal.
| AlotOfReading wrote:
| Espressif has a huge advantage in lead times over ST
| recently. I migrated a few projects over because ST
| couldn't or wouldn't give us supply in under a month when
| you could buy ESP chips and have them on your doorstep
| practically overnight.
| vbezhenar wrote:
| Did you look at chinese STM clones? We used gd32, I liked
| it.
| 6SixTy wrote:
| Wide product range. ARM and RISC-V all called GD32 with
| an extra letter for the exact line.
| makapuf wrote:
| Depends if you want to condone IP theft (compatible
| independent developments is of course different but I'm
| not sure this is the case). R&D, Support, good
| documentation in English and accuracy of specs come at a
| price.
| pantalaimon wrote:
| Is implementing the same peripheral register API really
| IP theft?
| makapuf wrote:
| No, this was my remark about being compatible.
| vbezhenar wrote:
| Espressif is Chinese company, just like gigadevices.
|
| I don't think they do IP theft from STM32 (if it's even
| possible at current node size). They have very thorough
| datasheets different from STM32 ones and their own SDK
| with unique code (although it seems to be inspired by
| STM32 libraries, but absolutely not theft).
| yau8edq12i wrote:
| Nobody said that espressif steals IP from ST.
| the__alchemist wrote:
| That's a big differentiator - it's surprising that there is
| no STM32 with Wi-Fi.
| mort96 wrote:
| That's the whole appeal to hobbyists, sure. But I'm
| guessing Espressif wants to be considered for more serious
| applications as well. Currently, there are good reasons to
| choose, say, an STM32 over an ESP32 for a commercial
| product if you don't need RF (or if RF is handled by
| another part of the product, such as a SoM running Linux).
| I'm guessing they wanna change that.
| sitkack wrote:
| The ESP8684H2 is 1.20 qty 1, more than enough to handle BT an
| Wifi, then you can use any MCU you want as your application
| processor.
| ajross wrote:
| FWIW this bit in the article is a little confused. The SIMD
| instructions being detailed appear to be Espressif-custom
| things implemented using Cadence's "TIE" facility.
|
| Cadence does indeed have their own SIMD architecture ("HiFi",
| really it's a family of similar but binary-incompatible ISAs).
| And indeed docs for that don't appear in public (though if you
| look carefully, details for how to emit the instructions are
| part of the GNU toolchain integration).
|
| But that isn't this. If you want docs for this, talk to
| Espressif, not Cadence.
| mianos wrote:
| It is possible there is some licensing issue around the SIMD,
| after all it is an optional component. It was available for the
| LX6 as well, but not included. It's been a good run but it's
| great the are going to the RISCV, at the very least for the
| vibe. I have used both architectures, more recently using their
| esp-idf and it is surprisingly uneventful to switch between
| them. The only issue I had is the different high/low speed
| timer devices between chips. In fact it is a surprise the on
| chip peripheral hardware is incredibly compatible with their
| idf. Sure, they have a layer for some calls but a lot is just
| issuing commands to io devices directly, and the same between
| riskv and tensilkica cores.
| ajb wrote:
| Xtensa is an unusual beast because its USP (at least, back when
| it was owned by tensilica) was that you could easily add
| extensions. Not just off the shelf ones - ones you defined
| yourself. They had some automation that would generate a
| toolchain for you to use with your shiny new instructions. Most
| CPU architectures exist to allow programs written on once
| implementation to work on another, with Xtensa it's kind of the
| opposite - it exists to allow each chip to have its own special
| sauce.
|
| Honestly I was a bit surprised that espressif used it without
| defining their own extension of some kind, if you're not doing
| that then you might as well use something better known.
|
| Edit: ajross* points out that this SIMD extension is such a
| one, not an off the shelf one. So I guess that explains it.
|
| * https://news.ycombinator.com/item?id=40267977
| lunfard000 wrote:
| Was it a secret? You could have guessed that something advertised
| [0] for "AI" had some kind of SIMD. Even ChatGPT 3.5 can give
| relevant code to use "AI" features [1].
|
| 0: https://www.espressif.com/en/products/socs/esp32-s3
|
| 1: https://chat.openai.com/share/3e1f990d-e8eb-4e56-acbb-
| ad5a33...
| iamflimflam1 wrote:
| Not a secret - just not documented very well if at all.
|
| We all knew there were SIMD instructions, but if there's no
| information on how to use them or what they do...
| lunfard000 wrote:
| And the author is not documenting them either, just
| announcing his new niche library. It is not like
| disassembling a few functions to prove that they exist is
| dark magic. I just don't see any value in the article.
| iamflimflam1 wrote:
| I'm not sure I'd call a JPEG decoding library "niche".
|
| There are some numbers here on the performance improvements
| he's managed to make.
|
| https://atomic14.substack.com/p/even-faster-jpeg-decoding
| bitbank wrote:
| You need to go back and read it again. I provide links to
| the relevant Espressif documents and in my next article I
| provide a simple example to get started. Would you rather
| have me copy the hundreds of pages of PDF into my blog post
| instead of providing a link?
| bobmcnamara wrote:
| IIRC, they have 128bit alignment requirements, so tricky to
| autovectorize.
| bitbank wrote:
| True - load and store mask off the bottom 4 bits of the
| address. They try to help the situation by including an
| instruction which can shift a pair of 128-bit registers by
| bytes.
| relaxing wrote:
| I love doing engineering based off of advertising material...
| amelius wrote:
| > Even ChatGPT 3.5 can give relevant code to use "AI" features
|
| I've seen ChatGPT invent its own functions and commands ...
| exe34 wrote:
| if problem: solve_problem()
|
| There, problem solved!
| ssl-3 wrote:
| # rest of problem-solving code goes here
| makapuf wrote:
| I've also definitely seen it reference invented methods on
| APIs (that would have been very nice if they existed) - that
| no past or future version implemented.
| tzmlab wrote:
| There's also a follow-up blog post "ESP32-S3 SIMD Minimal
| Example" [0].
|
| 0:
| https://bitbanksoftware.blogspot.com/2024/01/esp32-s3-simd-m...
| londons_explore wrote:
| ESP_Sprite, former opensource-projects-guy, now Espressif
| employee, is the best source of knowledge on this stuff.
|
| Looks like back in 2021 they had an intention to document these,
| but never quite got round to it:
|
| https://esp32.com/viewtopic.php?p=88114&sid=f7f25776d9cfc6b6...
|
| They do publish a bunch of opensource code that uses the SIMD
| stuff, and an assembler, so it isn't secret, just very badly
| documented.
| londons_explore wrote:
| Upon further inspection, it now seems like it is much better
| documented...
|
| Page 37-301 of the reference manual seems to have all you'd
| need, including binary instruction encodings, details on
| instruction timings, etc.
|
| https://www.espressif.com/sites/default/files/documentation/...
| jononor wrote:
| This looks pretty decent! And then esp-dsp can be used as
| example code in some cases. Personally I want to have
| accelerated Neural Networks for ESP32-S3 using this. Ideally
| for https://github.com/sipeed/TinyMaix which is one of the
| smallest and most hackable CNN implementations.
| londons_explore wrote:
| I had plans to use this SIMD support for some DSP algorithms on
| camera video feeds.... But looking at how badly documented it is,
| I may reconsider...
|
| Without scatter/gather I don't think I'm gonna be able to meet my
| timing requirements (I need to distort images through warping,
| which is tricky to do without scatter/gather)
| amelius wrote:
| Where is a good overview of the various ESP32 chips available and
| their features?
| mort96 wrote:
| Espressif has a pretty decent overview on their website:
| https://www.espressif.com/en/products/socs
|
| They also make a set of modules per chip, so you can get a
| particular chip in an easier to use package with e.g a built-in
| PCB antenna or antenna mounting ports or no antenna, various
| onboard flash sizes, that sort of stuff:
| https://www.espressif.com/en/products/modules
| the__alchemist wrote:
| S3 if you want more pins and fast
|
| C3 if you don't, and are OK with RISC-V
|
| PICO 3v2 otherwise.
| tbyehl wrote:
| S3 also has USB support that I've come to hugely appreciate
| on dev boards... tho I just got some oddball single-port
| boards that used a CH340 anyways. Grrr.
| 15155 wrote:
| S3 does not have real USB support (in the same way that any
| STM32 with "USB support" does) - it has a USB-UART/JTAG
| device that you cannot redefine built-in.
|
| (edit: apparently it might? I can't find the documents on
| how to actually use the OTG device over the fixed IP)
| bitbank wrote:
| I think you're thinking of the ESP32-C3. The S3 does have
| a fully programmable USB port that can do things like
| HID, mass storage, etc.
| the__alchemist wrote:
| I'm pretty confused by all this! Ie why they are set up
| like this. Example: All STM32 dev boards have two USB
| ports: One connected to the USB periph; one to a built-in
| ST-Link (JTAG). You flash and debug/print to CLI off
| JTAG, and use the USB one if your device needs to
| communicate with a PC etc during operation, or if you
| want to use DFU flashing for production boards or
| firmware updates.
|
| If you are designing a board, you will probably always
| have the JTAG pins broken out to a port of your choice,
| and use an external debugger. Wire USB A/R.
|
| The USB dev boards for C3 seem to all have only a UART
| bridge USB, and no JTAG! That is confusing because A: I'm
| not sure if this is a full-up USB peripheral for use as
| serial (But maybe not HID? Maybe it presents as USB-
| serial to the PC, but you program it on the MCU like
| UART?), and B: Why JTAG isn't table-stakes for a dev
| board.
|
| I ended up buying a "C3-Rust" devboard, because it was
| the only one I found that had JTAG USB! (I am
| coincidentally using Rust, but that's only superficially
| relevant)
| tbyehl wrote:
| C3 boards are a mess, almost all of them are single-port
| with a USB-UART chip despite supporting USB-CDC & JTAG.
| Lots of C6 boards are dual-port but sometimes they're
| single-port, typically without a USB-UART chip. And to
| make it more confusing, sometimes a single-port C6 board
| will have a USB-UART chip that's hanging off a USB hub
| chip.
|
| S3 has full USB-OTG support and most boards are either
| dual-port or single-port without a USB-UART chip. I dig
| 'em because I can put the TinyUF2 bootloader on them and
| get that Pi Pico experience of having them come up as
| mass storage.
|
| Except for these S3 boards I bought in Arduino Uno format
| where the designer made every decisions as wrongly as
| possible -- single USB, USB-UART chip, and the USB pins
| broken out to Arduino pins instead of using one of the
| optional headers they added.
| hrydgard wrote:
| Nice!
|
| Could save a couple of cycles per iteration by preloading the
| shift amounts into several GPRs before entering the loop, instead
| of initializing them just before use.
| DeathArrow wrote:
| What about ESP32-C3, using RISC-V architecture, does it also have
| SIMD instructions?
| guntars wrote:
| It does not. There are RISC-V chips out there (T-Head 906) that
| have a pre-1.0 vector extension, but these are 64-bit
| application processors. I'm sure we'll see ESP32 RISC-V chips
| with SIMD in the next few years.
___________________________________________________________________
(page generated 2024-05-06 23:01 UTC)