[HN Gopher] SIMD-accelerated computer vision on a $2 microcontro...
___________________________________________________________________
SIMD-accelerated computer vision on a $2 microcontroller
Author : shraiwi
Score : 280 points
Date : 2024-06-25 02:10 UTC (20 hours ago)
(HTM) web link (shraiwi.github.io)
(TXT) w3m dump (shraiwi.github.io)
| restricted_ptr wrote:
| I wonder if ESP32 has VLIW slots and a tighter instruction
| packaging is possible?
| duskwuff wrote:
| Neither Xtensa nor RISC-V are VLIW architectures.
| restricted_ptr wrote:
| Xtensa architecture is flexible and extendable by the user.
| Ability to define new instructions, hw features and VLIW
| configurations are some of the key features. You can find
| more details on the internet
| https://en.m.wikipedia.org/wiki/Tensilica
| thrtythreeforty wrote:
| Generally speaking, this is not correct. _Base_ Xtensa is not
| VLIW, but Xtensa 's various vector extensions do allow VLIW
| instructions, collectively called "FLIX."
|
| It is doubtful that ESP32's Xtensa is VLIW-capable, though.
| Presumably their compiler would emit FLIX instructions if it
| were.
| westurner wrote:
| > _As I 've been really interested in computer vision lately, I
| decided on writing a SIMD-accelerated implementation of the FAST
| feature detector for the ESP32-S3_ [...]
|
| > _In the end, I was able to improve the throughput of the FAST
| feature detector by about 220%, from 5.1MP /s to 11.2MP/s in my
| testing. This is well within the acceptable range of performance
| for realtime computer vision tasks, enabling the ESP32-S3 to
| easily process a 30fps VGA stream._
|
| What are some use cases for FAST?
|
| Features from accelerated segment test:
| https://en.wikipedia.org/wiki/Features_from_accelerated_segm...
|
| Is there TPU-like functionality in anything in this price range
| of chips yet?
|
| Neon is an optional SIMD instruction set extension for ARMv7 and
| ARMv8; so Pi Zero and larger have SIMD extensions
|
| Orrin Nano have 40 TOPS, which is sufficient for Copilot+ AFAIU.
| "A PCIe Coral TPU Finally Works on Raspberry Pi 5"
| https://news.ycombinator.com/item?id=38310063
|
| From https://phys.org/news/2024-06-infrared-visible-
| device-2d-mat... :
|
| > _Using this method, they were able to up-convert infrared light
| of wavelength around 1550 nm to 622 nm visible light. The output
| light wave can be detected using traditional silicon-based
| cameras._
|
| > _" This process is coherent--the properties of the input beam
| are preserved at the output. This means that if one imprints a
| particular pattern in the input infrared frequency, it
| automatically gets transferred to the new output frequency,"
| explains Varun Raghunathan, Associate Professor in the Department
| of Electrical Communication Engineering (ECE) and corresponding
| author of the study published in Laser & Photonics Reviews._
|
| "Show HN: PicoVGA Library - VGA/TV Display on Raspberry Pi Pico"
| https://news.ycombinator.com/item?id=35117847#35120403
| https://news.ycombinator.com/item?id=40275530
|
| "Designing a SIMD Algorithm from Scratch"
| https://news.ycombinator.com/item?id=38450374
| shraiwi wrote:
| Thanks for reading!
|
| > What are some use cases for FAST?
|
| The FAST feature detector is an algorithm for finding regions
| of an image that are visually distinctive, which can be used as
| a first step in motion tracking and SLAM (simultaneous
| localization and mapping) algorithms typically seen in XR,
| robotics, etc.
|
| > Is there TPU-like functionality in anything in this price
| range of chips yet?
|
| I think that in the case of the ESP32-S3, its SIMD instructions
| are designed to accelerate the inference of quantized AI models
| (see: https://github.com/espressif/esp-dl), and also some
| signal processing like FFTs. I guess you could call the SIMD
| instructions TPU-like, in the sense that the chip has specific
| instructions that facilitates ML inference (EE.VRELU.Sx
| performs the ReLU operation). Using these instructions will
| still take away CPU time where TPUs are typically their own
| processing core, operating asynchronously. I'd say this is
| closer to ARM NEON.
| kylixz wrote:
| Interested in doing more of this type of work optimizing a
| SLAM/factorgraph pipeline?
|
| Email in bio and would love to chat!
| implements wrote:
| > The FAST feature detector is an algorithm for finding
| regions of an image that are visually distinctive, ...
|
| Is that related to 'Energy Function' in any way?
|
| (I ask because a long time ago I was involved in an Automated
| Numberplate Reading startup that was using an FPGA to quickly
| find the vehicle numberplate in an image)
| ska wrote:
| What you are thinking of operates at a different level of
| abstraction. Energy functions are a general way of
| structuring a problem, used (sometimes abused) to apply an
| optimization algorithm to find a reasonable solution for
| it.
|
| FAST is an algorithm for efficiently looking for
| "interesting" parts (basically, corners) of an image, so
| you can safely (in theory) ignore the rest of it. The
| output from a feature detector may end up contributing to
| an energy function later, directly or indirectly.
| westurner wrote:
| SimSIMD https://github.com/ashvardanian/SimSIMD :
|
| > _Up to 200x Faster Inner Products and Vector Similarity --
| for Python, JavaScript, Rust, C, and Swift, supporting f64,
| f32, f16 real & complex, i8, and binary vectors using SIMD
| for both x86 AVX2 & AVX-512 and Arm NEON & SVE_
|
| github.com/topics/simd: https://github.com/topics/simd
|
| https://news.ycombinator.com/item?id=37805810#37808036
| yatopifo wrote:
| > Is there TPU-like functionality in anything in this price
| range of chips yet?
|
| Kendryte K210 supports 1x1 and 3x3 convolutions on the "TPU".
| It was pretty well supported in terms of software &
| documentation but sadly it hasn't become popular.
|
| These days, you can easily find cheap RV1103 ("LuckFox"), BL808
| ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards,
| all of which have some sort of basic TPU. Unfortunately, they
| are designed to be linux boards meaning that all TPU related
| stuff is extremely abstracted with zero under-the-hood
| documentation. So it's absolutely unclear whether their TPUs
| are real or faked with clever code optimizations.
| koerakoonlane wrote:
| > These days, you can easily find cheap RV1103 ("LuckFox"),
| BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev
| boards, all of which have some sort of basic TPU.
| Unfortunately, they are designed to be linux boards meaning
| that all TPU related stuff is extremely abstracted with zero
| under-the-hood documentation. So it's absolutely unclear
| whether their TPUs are real or faked with clever code
| optimizations.
|
| They all have TPU in hardware, my team has been verifying and
| benchmarking them. Documentation is only available for the
| high-level C APIs to the libraries that a programmer is
| expected to use, and even that tends to be extremely lacking.
| evanjrowley wrote:
| A comparable board is the ESP32-CAM, which is supported by this
| really practical computer vision project:
| https://github.com/jomjol/AI-on-the-edge-device?tab=readme-o...
| maven29 wrote:
| There is an ESP32-S3 version of this camera breakout board,
| which is presumably what OP might have used for prototyping.
|
| The S3 variant easily justifies the slight additional cost,
| given that it's easily faster by an order of magnitude or
| greater, having SIMD and an FPU.
|
| https://github.com/espressif/esp-dl/tree/master/examples/fac...
| hi-v-rocknroll wrote:
| In the CV department, I recently ordered a cheap FPGA + ARM
| Cortex-M3 + 64 Mbit SRAM + 32 Mbit flash that does camera input
| and HDMI output. Like a budget Zynq for CV.
|
| https://wiki.sipeed.com/hardware/en/tang/Tang-Nano-4K/Nano-4...
|
| https://www.aliexpress.us/item/3256806880637138.html
| unwind wrote:
| Cool board!
|
| Would any of the "retro" game/home computer firmwares fit in
| that FPGA? I find comparing capacity hard for stuff like
| that.
| hi-v-rocknroll wrote:
| There's absolutely no reason ROMs have to waste scarce
| resources of a hybrid FPGA. Micro SD cards (called TF in
| China) and eMMC are the usual solutions.
|
| Example:
| https://www.aliexpress.us/item/3256806498688867.html
| londons_explore wrote:
| Yes, easily, but unless someone has done it already,
| 'porting' them to this board would be a lot of work.
| 3abiton wrote:
| I wish I had the time to tinker with these bad boys
| amelius wrote:
| How many fps can that project do?
| julius wrote:
| Oh wow TIL ESP32 can run TensorFlowLite. Person detection in
| 54ms! https://github.com/espressif/esp-tflite-micro?tab=readme-
| ov-...
| picture wrote:
| Also see this short post about SIMD on ESP32-S3, discussed
| previously.
| https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-...
| rurban wrote:
| We prefer something more expensive and better: https://up-
| board.org/upsquared/specifications/
|
| Intel UpSquared
| rowanG077 wrote:
| More expensive sure. But better is pretty rich considering it
| is Intel. My money is on this platform just evaporating in the
| next 5 years. Esp32 has proven you can rely on supply and
| longevity.
| DeathArrow wrote:
| >For silicon that's cheaper than the average coffee, that's
| pretty cool.
|
| Maybe it's not the chip that it's too cheap. Maybe it's the
| coffee that's too expensive.
| mppm wrote:
| OTOH, I've been waiting for disposable coffee cups with OLED-
| based video ads ever since Minority Report. But tech progress
| is just too damn slow :P
| yjftsjthsd-h wrote:
| I dunno about OLED, but now that you say it the costs do make
| some sort of "smart" coffee disturbingly plausible.
| surfingdino wrote:
| Almost there...
| https://www.moveelectric.com/e-motorbikes/super-soco-aims-
| se...
| throwaway211 wrote:
| Drink more microcontrollers.
| rhelz wrote:
| > Maybe it's the coffee that's too expensive.
|
| Ha, well, there is a disturbing reason why computer vision with
| ultra-cheap hardware is possible: countries all over the world
| are buying these by the billions in order to keep an eye on
| their citizens :-(
|
| Big brother is enabling incredible economies of scale....
| jacoblambda wrote:
| I wish but tbh coffee is probably artificially cheaper than it
| really should be since larger corporations exploit local farms
| and effectively maintain local monopolies where farms have to
| sell to said corporations for a fraction of the price it's
| actually worth.
| ladyanita22 wrote:
| Anyone with experience on Rust for ESP32 controllers could chime
| in on whether this is feasible on rust as well?
| f_devd wrote:
| It is possible, mainly depends on LLVM/clang support as rust
| ASM is very easy to do
| Qwuke wrote:
| Compared to ESP8266, there's generally pretty good ESP32
| support for Rust, but you'll likely need to use in your C++
| toolchain if you want to use the standard library. no-std in
| Rust for ESP32 isn't terrible in my experience, though, just
| not as fleshed out - particularly for hooking into components
| like wifi/networking and probably a camera as well.
|
| Like the other commenter said, there's plenty of support for
| SIMD and asm in Rust.
|
| You might ask around on a Rust embedded or Rust ESP32 chatroom
| before making the dive.
| the__alchemist wrote:
| You can actually use the IDF system in Rust to use the std
| lib, at least on ESP32-C3. Probably others too.
|
| If you are on Windows, you will need to place the project
| folder at the top level drive directory, and there are other
| quirks as well, but it works.
| sylware wrote:
| Yep, SIMD seems to win the race vs SMT for that type of
| processing.
| dansitu wrote:
| If you're interested in this stuff and wanna try it yourself,
| check out our product, Edge Impulse:
|
| https://edgeimpulse.com/ai-practitioners
|
| We work directly with vendors to perform low level optimization
| of deep learning, computer vision, and DSP workloads for dozens
| of architectures of microcontrollers and CPUs, plus exotic
| accelerators (neuromorphic compute!) and edge GPUs. This includes
| ESP32:
|
| https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu/espre...
|
| You can upload a TensorFlow, PyTorch, or JAX model and receive an
| optimized C++ library direct from your notebook in a couple lines
| of Python. It's honestly pretty amazing.
|
| And we also have a full Studio for training models, including
| architectures we've designed specifically to run well on various
| embedded hardware, plus hardware-aware hyperparameter
| optimization that will find the best model to fit your target
| device (in terms of latency and memory use).
| TheMagicHorsey wrote:
| Yo! This is awesome stuff!
| dansitu wrote:
| Thank you! We're trying to bring embedded ML in reach of all
| engineering teams and domain experts.
|
| Previously you needed a crazy mixture of ML knowledge and
| low-level embedded engineering skills even to get started,
| which is not a common occurrence!
| RobotToaster wrote:
| I don't think the output from this can be used in any open
| source project due to the community plan restrictions, FYI.
| robxorb wrote:
| I wonder how hard it would be, presumably with some trade-off
| with detection windows, to use a few of these in parallel and
| process higher resolutions and frame rates?
___________________________________________________________________
(page generated 2024-06-25 23:01 UTC)