[HN Gopher] SIMD-accelerated computer vision on a $2 microcontro...
       ___________________________________________________________________
        
       SIMD-accelerated computer vision on a $2 microcontroller
        
       Author : shraiwi
       Score  : 280 points
       Date   : 2024-06-25 02:10 UTC (20 hours ago)
        
 (HTM) web link (shraiwi.github.io)
 (TXT) w3m dump (shraiwi.github.io)
        
       | restricted_ptr wrote:
       | I wonder if ESP32 has VLIW slots and a tighter instruction
       | packaging is possible?
        
         | duskwuff wrote:
         | Neither Xtensa nor RISC-V are VLIW architectures.
        
           | restricted_ptr wrote:
           | Xtensa architecture is flexible and extendable by the user.
           | Ability to define new instructions, hw features and VLIW
           | configurations are some of the key features. You can find
           | more details on the internet
           | https://en.m.wikipedia.org/wiki/Tensilica
        
           | thrtythreeforty wrote:
           | Generally speaking, this is not correct. _Base_ Xtensa is not
           | VLIW, but Xtensa 's various vector extensions do allow VLIW
           | instructions, collectively called "FLIX."
           | 
           | It is doubtful that ESP32's Xtensa is VLIW-capable, though.
           | Presumably their compiler would emit FLIX instructions if it
           | were.
        
       | westurner wrote:
       | > _As I 've been really interested in computer vision lately, I
       | decided on writing a SIMD-accelerated implementation of the FAST
       | feature detector for the ESP32-S3_ [...]
       | 
       | > _In the end, I was able to improve the throughput of the FAST
       | feature detector by about 220%, from 5.1MP /s to 11.2MP/s in my
       | testing. This is well within the acceptable range of performance
       | for realtime computer vision tasks, enabling the ESP32-S3 to
       | easily process a 30fps VGA stream._
       | 
       | What are some use cases for FAST?
       | 
       | Features from accelerated segment test:
       | https://en.wikipedia.org/wiki/Features_from_accelerated_segm...
       | 
       | Is there TPU-like functionality in anything in this price range
       | of chips yet?
       | 
       | Neon is an optional SIMD instruction set extension for ARMv7 and
       | ARMv8; so Pi Zero and larger have SIMD extensions
       | 
       | Orrin Nano have 40 TOPS, which is sufficient for Copilot+ AFAIU.
       | "A PCIe Coral TPU Finally Works on Raspberry Pi 5"
       | https://news.ycombinator.com/item?id=38310063
       | 
       | From https://phys.org/news/2024-06-infrared-visible-
       | device-2d-mat... :
       | 
       | > _Using this method, they were able to up-convert infrared light
       | of wavelength around 1550 nm to 622 nm visible light. The output
       | light wave can be detected using traditional silicon-based
       | cameras._
       | 
       | > _" This process is coherent--the properties of the input beam
       | are preserved at the output. This means that if one imprints a
       | particular pattern in the input infrared frequency, it
       | automatically gets transferred to the new output frequency,"
       | explains Varun Raghunathan, Associate Professor in the Department
       | of Electrical Communication Engineering (ECE) and corresponding
       | author of the study published in Laser & Photonics Reviews._
       | 
       | "Show HN: PicoVGA Library - VGA/TV Display on Raspberry Pi Pico"
       | https://news.ycombinator.com/item?id=35117847#35120403
       | https://news.ycombinator.com/item?id=40275530
       | 
       | "Designing a SIMD Algorithm from Scratch"
       | https://news.ycombinator.com/item?id=38450374
        
         | shraiwi wrote:
         | Thanks for reading!
         | 
         | > What are some use cases for FAST?
         | 
         | The FAST feature detector is an algorithm for finding regions
         | of an image that are visually distinctive, which can be used as
         | a first step in motion tracking and SLAM (simultaneous
         | localization and mapping) algorithms typically seen in XR,
         | robotics, etc.
         | 
         | > Is there TPU-like functionality in anything in this price
         | range of chips yet?
         | 
         | I think that in the case of the ESP32-S3, its SIMD instructions
         | are designed to accelerate the inference of quantized AI models
         | (see: https://github.com/espressif/esp-dl), and also some
         | signal processing like FFTs. I guess you could call the SIMD
         | instructions TPU-like, in the sense that the chip has specific
         | instructions that facilitates ML inference (EE.VRELU.Sx
         | performs the ReLU operation). Using these instructions will
         | still take away CPU time where TPUs are typically their own
         | processing core, operating asynchronously. I'd say this is
         | closer to ARM NEON.
        
           | kylixz wrote:
           | Interested in doing more of this type of work optimizing a
           | SLAM/factorgraph pipeline?
           | 
           | Email in bio and would love to chat!
        
           | implements wrote:
           | > The FAST feature detector is an algorithm for finding
           | regions of an image that are visually distinctive, ...
           | 
           | Is that related to 'Energy Function' in any way?
           | 
           | (I ask because a long time ago I was involved in an Automated
           | Numberplate Reading startup that was using an FPGA to quickly
           | find the vehicle numberplate in an image)
        
             | ska wrote:
             | What you are thinking of operates at a different level of
             | abstraction. Energy functions are a general way of
             | structuring a problem, used (sometimes abused) to apply an
             | optimization algorithm to find a reasonable solution for
             | it.
             | 
             | FAST is an algorithm for efficiently looking for
             | "interesting" parts (basically, corners) of an image, so
             | you can safely (in theory) ignore the rest of it. The
             | output from a feature detector may end up contributing to
             | an energy function later, directly or indirectly.
        
           | westurner wrote:
           | SimSIMD https://github.com/ashvardanian/SimSIMD :
           | 
           | > _Up to 200x Faster Inner Products and Vector Similarity --
           | for Python, JavaScript, Rust, C, and Swift, supporting f64,
           | f32, f16 real & complex, i8, and binary vectors using SIMD
           | for both x86 AVX2 & AVX-512 and Arm NEON & SVE_
           | 
           | github.com/topics/simd: https://github.com/topics/simd
           | 
           | https://news.ycombinator.com/item?id=37805810#37808036
        
         | yatopifo wrote:
         | > Is there TPU-like functionality in anything in this price
         | range of chips yet?
         | 
         | Kendryte K210 supports 1x1 and 3x3 convolutions on the "TPU".
         | It was pretty well supported in terms of software &
         | documentation but sadly it hasn't become popular.
         | 
         | These days, you can easily find cheap RV1103 ("LuckFox"), BL808
         | ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards,
         | all of which have some sort of basic TPU. Unfortunately, they
         | are designed to be linux boards meaning that all TPU related
         | stuff is extremely abstracted with zero under-the-hood
         | documentation. So it's absolutely unclear whether their TPUs
         | are real or faked with clever code optimizations.
        
           | koerakoonlane wrote:
           | > These days, you can easily find cheap RV1103 ("LuckFox"),
           | BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev
           | boards, all of which have some sort of basic TPU.
           | Unfortunately, they are designed to be linux boards meaning
           | that all TPU related stuff is extremely abstracted with zero
           | under-the-hood documentation. So it's absolutely unclear
           | whether their TPUs are real or faked with clever code
           | optimizations.
           | 
           | They all have TPU in hardware, my team has been verifying and
           | benchmarking them. Documentation is only available for the
           | high-level C APIs to the libraries that a programmer is
           | expected to use, and even that tends to be extremely lacking.
        
       | evanjrowley wrote:
       | A comparable board is the ESP32-CAM, which is supported by this
       | really practical computer vision project:
       | https://github.com/jomjol/AI-on-the-edge-device?tab=readme-o...
        
         | maven29 wrote:
         | There is an ESP32-S3 version of this camera breakout board,
         | which is presumably what OP might have used for prototyping.
         | 
         | The S3 variant easily justifies the slight additional cost,
         | given that it's easily faster by an order of magnitude or
         | greater, having SIMD and an FPU.
         | 
         | https://github.com/espressif/esp-dl/tree/master/examples/fac...
        
         | hi-v-rocknroll wrote:
         | In the CV department, I recently ordered a cheap FPGA + ARM
         | Cortex-M3 + 64 Mbit SRAM + 32 Mbit flash that does camera input
         | and HDMI output. Like a budget Zynq for CV.
         | 
         | https://wiki.sipeed.com/hardware/en/tang/Tang-Nano-4K/Nano-4...
         | 
         | https://www.aliexpress.us/item/3256806880637138.html
        
           | unwind wrote:
           | Cool board!
           | 
           | Would any of the "retro" game/home computer firmwares fit in
           | that FPGA? I find comparing capacity hard for stuff like
           | that.
        
             | hi-v-rocknroll wrote:
             | There's absolutely no reason ROMs have to waste scarce
             | resources of a hybrid FPGA. Micro SD cards (called TF in
             | China) and eMMC are the usual solutions.
             | 
             | Example:
             | https://www.aliexpress.us/item/3256806498688867.html
        
             | londons_explore wrote:
             | Yes, easily, but unless someone has done it already,
             | 'porting' them to this board would be a lot of work.
        
           | 3abiton wrote:
           | I wish I had the time to tinker with these bad boys
        
         | amelius wrote:
         | How many fps can that project do?
        
         | julius wrote:
         | Oh wow TIL ESP32 can run TensorFlowLite. Person detection in
         | 54ms! https://github.com/espressif/esp-tflite-micro?tab=readme-
         | ov-...
        
       | picture wrote:
       | Also see this short post about SIMD on ESP32-S3, discussed
       | previously.
       | https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-...
        
       | rurban wrote:
       | We prefer something more expensive and better: https://up-
       | board.org/upsquared/specifications/
       | 
       | Intel UpSquared
        
         | rowanG077 wrote:
         | More expensive sure. But better is pretty rich considering it
         | is Intel. My money is on this platform just evaporating in the
         | next 5 years. Esp32 has proven you can rely on supply and
         | longevity.
        
       | DeathArrow wrote:
       | >For silicon that's cheaper than the average coffee, that's
       | pretty cool.
       | 
       | Maybe it's not the chip that it's too cheap. Maybe it's the
       | coffee that's too expensive.
        
         | mppm wrote:
         | OTOH, I've been waiting for disposable coffee cups with OLED-
         | based video ads ever since Minority Report. But tech progress
         | is just too damn slow :P
        
           | yjftsjthsd-h wrote:
           | I dunno about OLED, but now that you say it the costs do make
           | some sort of "smart" coffee disturbingly plausible.
        
           | surfingdino wrote:
           | Almost there...
           | https://www.moveelectric.com/e-motorbikes/super-soco-aims-
           | se...
        
         | throwaway211 wrote:
         | Drink more microcontrollers.
        
         | rhelz wrote:
         | > Maybe it's the coffee that's too expensive.
         | 
         | Ha, well, there is a disturbing reason why computer vision with
         | ultra-cheap hardware is possible: countries all over the world
         | are buying these by the billions in order to keep an eye on
         | their citizens :-(
         | 
         | Big brother is enabling incredible economies of scale....
        
         | jacoblambda wrote:
         | I wish but tbh coffee is probably artificially cheaper than it
         | really should be since larger corporations exploit local farms
         | and effectively maintain local monopolies where farms have to
         | sell to said corporations for a fraction of the price it's
         | actually worth.
        
       | ladyanita22 wrote:
       | Anyone with experience on Rust for ESP32 controllers could chime
       | in on whether this is feasible on rust as well?
        
         | f_devd wrote:
         | It is possible, mainly depends on LLVM/clang support as rust
         | ASM is very easy to do
        
         | Qwuke wrote:
         | Compared to ESP8266, there's generally pretty good ESP32
         | support for Rust, but you'll likely need to use in your C++
         | toolchain if you want to use the standard library. no-std in
         | Rust for ESP32 isn't terrible in my experience, though, just
         | not as fleshed out - particularly for hooking into components
         | like wifi/networking and probably a camera as well.
         | 
         | Like the other commenter said, there's plenty of support for
         | SIMD and asm in Rust.
         | 
         | You might ask around on a Rust embedded or Rust ESP32 chatroom
         | before making the dive.
        
           | the__alchemist wrote:
           | You can actually use the IDF system in Rust to use the std
           | lib, at least on ESP32-C3. Probably others too.
           | 
           | If you are on Windows, you will need to place the project
           | folder at the top level drive directory, and there are other
           | quirks as well, but it works.
        
       | sylware wrote:
       | Yep, SIMD seems to win the race vs SMT for that type of
       | processing.
        
       | dansitu wrote:
       | If you're interested in this stuff and wanna try it yourself,
       | check out our product, Edge Impulse:
       | 
       | https://edgeimpulse.com/ai-practitioners
       | 
       | We work directly with vendors to perform low level optimization
       | of deep learning, computer vision, and DSP workloads for dozens
       | of architectures of microcontrollers and CPUs, plus exotic
       | accelerators (neuromorphic compute!) and edge GPUs. This includes
       | ESP32:
       | 
       | https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu/espre...
       | 
       | You can upload a TensorFlow, PyTorch, or JAX model and receive an
       | optimized C++ library direct from your notebook in a couple lines
       | of Python. It's honestly pretty amazing.
       | 
       | And we also have a full Studio for training models, including
       | architectures we've designed specifically to run well on various
       | embedded hardware, plus hardware-aware hyperparameter
       | optimization that will find the best model to fit your target
       | device (in terms of latency and memory use).
        
         | TheMagicHorsey wrote:
         | Yo! This is awesome stuff!
        
           | dansitu wrote:
           | Thank you! We're trying to bring embedded ML in reach of all
           | engineering teams and domain experts.
           | 
           | Previously you needed a crazy mixture of ML knowledge and
           | low-level embedded engineering skills even to get started,
           | which is not a common occurrence!
        
         | RobotToaster wrote:
         | I don't think the output from this can be used in any open
         | source project due to the community plan restrictions, FYI.
        
       | robxorb wrote:
       | I wonder how hard it would be, presumably with some trade-off
       | with detection windows, to use a few of these in parallel and
       | process higher resolutions and frame rates?
        
       ___________________________________________________________________
       (page generated 2024-06-25 23:01 UTC)