[HN Gopher] YOLOv5 on FPGA with Hailo-8 and 4 Pi Cameras
       ___________________________________________________________________
        
       YOLOv5 on FPGA with Hailo-8 and 4 Pi Cameras
        
       Author : geerlingguy
       Score  : 123 points
       Date   : 2024-05-31 03:19 UTC (19 hours ago)
        
 (HTM) web link (www.fpgadeveloper.com)
 (TXT) w3m dump (www.fpgadeveloper.com)
        
       | zimpenfish wrote:
       | That looked interesting as a self-hostable project until it got
       | to the requirement for a $3200 AMD board. Maybe the price will
       | come down one day...
        
         | DoingIsLearning wrote:
         | The ultrascale is definitely pricy for non-industry
         | applications. The FPGA design is probably larger because of the
         | 4x camera pipeline.
         | 
         | Perhaps with a single camera you could port this to fit on a
         | Zynq 7000 footprint with something like a Pynq Z1 or Numato
         | Styx, which are in the $250 hobbies price point for example.
        
         | dailykoder wrote:
         | At the moment I am wondering if I could build an accelerator
         | for george hotz' tinygrad[1] with cheap FPGAs (i do have an
         | arty a7 35K, this might be too small, i guess?). According to
         | the readme it shall be "easy" to add new accelerator hardware.
         | Sadly my knowledge is still a bit limited around all the
         | python-machinelearning-ecosystem, but if I understand it
         | correctly you "just" need an openCL kernel and need to be able
         | to shove the data back and forth somehow.
         | 
         | Didn't have enough time to dive into it yet and still working
         | on some other project, but this still tickles the back of my
         | head and would be cool even if I could only run mnist on it.
         | 
         | - [1] https://github.com/tinygrad/tinygrad/
        
           | gh02t wrote:
           | > but if I understand it correctly you "just" need an openCL
           | kernel and need to be able to shove the data back and forth
           | somehow.
           | 
           | To use it with an FPGA accelerator you also have to build all
           | the "hardware" to run said openCL kernel efficiently, manage
           | data transfer, talk to the host, etc for the FPGA. This is
           | very foreign if you're only used to doing software design and
           | still very nontrivial even if you've done FPGA work, though I
           | think there are some open hardware projects around doing
           | this.
        
         | sorenjan wrote:
         | Depending on what you're trying to do you might not need it. A
         | popular option for object detection people use with their home
         | hosted video surveillance (using Frigate and Home Assistant) is
         | the Coral TPU and Raspberry Pi or some decommissioned thin
         | client.
        
           | muxamilian wrote:
           | Google Coral seems abandoned by Google. Not official but last
           | news on their page from May 2022
        
             | geerlingguy wrote:
             | I've heard from the folks at Pineboard[1] there's been some
             | new activity, as this year manufacturing ramped back up...
             | it's still decent hardware but is getting long in the
             | tooth.
             | 
             | Regarding the Hailo featured in the article, a few of us
             | have been messing with it on a Raspberry Pi 5[2], and it
             | offers more performance in a similar power envelope. The
             | major downside is availability. I can buy the Coral on many
             | electronics supplier sites, but Hailo seems to be selling
             | through 'Product inquiry' right now, which is not easy to
             | navigate as an individual!
             | 
             | [1] https://pineboards.io
             | 
             | [2] https://pipci.jeffgeerling.com/cards_m2/hailo-8-ai-
             | module.ht...
        
               | algo_trader wrote:
               | Any good write up on Hailo and similar ?
               | 
               | BTW, is it possible to have 10 of these connected to a
               | single board/cpu ?
        
           | zimpenfish wrote:
           | I did consider the Intel Neural Compute Stick to accelerate
           | with OpenVINO but they're discontinued and it turns out I can
           | get away with doing fewer detections (birds, not people) by
           | doing a pre-filter of motion detection (reduces the number of
           | frames through OpenCV's DNN by 10x).
        
       | scottapotamas wrote:
       | Great writeup. Always a treat to see more high quality FPGA
       | project postmortems, even if they aren't using accessible
       | parts/toolchains.
        
       | daghamm wrote:
       | I am not familiar with this NN accelerator.
       | 
       | Does anyone have a comparison between Hailo and, say, a mid or
       | high-end GPU or a TPU?
        
         | michaelt wrote:
         | According to [1] the manufacturers claim 2% of the TOPS of a
         | RTX 4090, and only 0.8% the power consumption.
         | 
         | $200 in prototype quantities [2] which is 12% the price of a
         | 4090 - but perhaps the price drops when you order them in bulk?
         | 
         | They claim it compares favourably to an 'Nvidia Xavier NX' for
         | image classification tasks, providing somewhat more FPS at
         | significantly lower power consumption. 218 fps running YOLOv5m
         | on 640x640 inputs.
         | 
         | They're completely silent about the amount of memory it has,
         | but you can fit int8 YOLOv5m into about 20 MB so it'll
         | certainly be an amount measured in megabytes rather than
         | gigabytes.
         | 
         | Their target market is "CCTV camera that tracks cars and
         | people" rather than "Run an LLM" or "train a network from
         | scratch"
         | 
         | [1] https://hailo.ai/products/ai-accelerators/hailo-8-ai-
         | acceler...
         | 
         | [2] https://up-shop.org/hailo-m2-key.html
        
           | muxamilian wrote:
           | From what I know it doesn't have memory but streams
           | everything to the chip. So there's no limit regarding the
           | size of the the neural network (unlike Google Coral)
        
             | michaelt wrote:
             | Maybe? If a 20 MB network achieves 218 fps that'd need 4.3
             | GB/s of bandwidth just to stream the network, completely
             | ignoring the images. And they use PCIe Gen3 which is 1 GB/s
             | per lane, with different products having 1, 2 or 4 lanes.
             | 
             | So... maybe just about?
        
             | geerlingguy wrote:
             | Some of their marketing mentions no need for external DRAM,
             | for example, from this CNX article[1]:
             | 
             | "One of the key reasons for the performance improvement is
             | that RAM is self-contained without the need for external
             | DRAM like other solutions. This decreases latency a lot and
             | reduces power consumption."
             | 
             | Not sure how much RAM is included on the chip, but I'm also
             | thinking in the tens of MB range, certainly not gigabytes.
             | 
             | [1] https://www.cnx-software.com/2020/10/07/learn-more-
             | about-hai...
        
               | alandarev wrote:
               | PCIE is slower than DRAM, so not having DRAM is not a
               | performance improvement, but a hardware limitation
               | reducing the scope of usability. Though it doesn't mean
               | bad - cheaper to produce, serving its own use cases.
               | 
               | Process a stream of data - yes. Machine learning on large
               | set of data - no.
        
         | muxamilian wrote:
         | It's a competitor to Google Coral (seems abandoned) and NVIDIA
         | Jetson. I've been using it for more than a year and the
         | hardware seems to be one of the best on the market. The
         | software (how to actually do inference on the chip) is subpar
         | though.
        
         | brk wrote:
         | Hailo is one of the newer GPU startups, so not surprising that
         | many people have not heard of them yet. So far
         | price/performance/power consumption of the Hailo products seems
         | to be filling a rater large gap between the Amba stuff that is
         | very well suited for 1-4 camera streams in a typical SoC-based
         | device, and the Jetson, which is really kind of overpriced and
         | power hungry for a lot of video applications (at least IMO).
        
         | Y_Y wrote:
         | Very cheap and power efficient if you're willing to run in
         | int8/int4 and have unlimited time and patience for development
        
       | globalnode wrote:
       | Some of the people didnt look too happy about you filming them,
       | then you went and put them online for the world to see, classy.
        
         | bigyikes wrote:
         | Breaking: people in public don't want to be seen by the public.
         | 
         | It's a detailed breakdown of a technically impressive project
         | and your main takeaway is the 5 seconds in the demo where a guy
         | covers his face?
         | 
         | Kudos to the author for making something neat and sharing it.
        
           | shaky-carrousel wrote:
           | Breaking: people in public don't want to be recorded and
           | uploaded on the internet for millions to see.
           | 
           | Crazy, right?
        
             | Workaccount2 wrote:
             | Well yeah, it is kind of crazy.
             | 
             | Any scenario in which undo harm is brought upon someone
             | because they were a passerby on the street in a video, is a
             | such a reach that you have to question how grounded in
             | reality they are. It's some deep "I'm the main character"
             | level thinking.
        
         | IncreasePosts wrote:
         | Oh, the horror. Holy shit was that Jason walking by? He told me
         | he was in Alaska...I'm going to need to have a talk with him.
         | Thank goodness I ran across this video.
        
         | yazzku wrote:
         | A good point that shouldn't be dismissed so quickly. This
         | tracking also has little use beyond surveillance, so one has to
         | wonder what is it that the author thinks they are doing, or why
         | they think it's interesting or useful. That they then go ahead
         | and film a crowd without their consent is more telling of their
         | position on this moral question than it is any direct harm to
         | the people in the video.
        
           | alandarev wrote:
           | That's how they score gov contracts - the safest and quickest
           | way to get rich. And oh boy do governments love control
        
       | shrubble wrote:
       | What is the ultimate delivery after all this work? Did it
       | correlate/track the same people across multiple video feeds, for
       | instance?
        
         | gte525u wrote:
         | It's line speed processing of multiple cameras in HW - it
         | should be less power consumption than equivalent GPU or Jetson.
        
       | throwaway2562 wrote:
       | This makes me think of monkeys cheerfully building their own
       | cage.
       | 
       | Sorry, but I can't just 'cool-project-bro' this one. Does nobody
       | else have the faintest misgivings about where we're at right now:
       | human surveillance as just scratching a technical itch?
       | 
       | Apologies if this comes over all grumpy, but wow. Seriously.
        
       | _giorgio_ wrote:
       | What's a good (production ready) setup?
       | 
       | I'm thinking of an external camera (weather resistant) and
       | hardware. The hardware could be a small computer that connects to
       | the camera (maybe with wifi?), and runs the YOLO model.
        
         | brk wrote:
         | That already exists. Lilin is one of several CCTV companies
         | implementing on-camera YOLO. Axis, Hanwha, and i-Pro all have
         | options for you to run your own software/models on camera as
         | well.
        
           | _giorgio_ wrote:
           | Ok, thanks. I'll look into that. I hope that they don't
           | require some specific abstruse format for the models!
        
       | eurekin wrote:
       | After years of wondering, I have to ask.
       | 
       | What are real life actual useful cases for this tech?
       | 
       | I can imagine in manufacturing: detecting defects or layout
       | mismatch - that's one.
       | 
       | Is there any open source project that uses a image recognition
       | library to achieve any useful task? All I've seen from board
       | partners seem to at most provide very simple demos, where a box
       | with label is drawn around an object. Who actually is using that
       | information, how and for what?
       | 
       | I've also been a part of the Kinect craze and made 3 demos (games
       | mostly) using their SDK and still have a very hard time defending
       | this tech in eyes of coworkers that only see this as a
       | surveillance tech
        
         | mechagodzilla wrote:
         | As you guessed, high-speed machine vision stuff is frequently
         | used in manufacturing settings for sorting or various quality
         | control tasks. Imagine a picking out bad potatoes on a conveyor
         | belt moving 10s of potatoes per second, or identifying particle
         | counts and size distributions in a stream of water to gauge
         | water quality.
        
           | eurekin wrote:
           | Plus, being a NN it might be possible to detect a foreign
           | object with relative ease (comparing to the classic computer
           | vision); like a rat
        
             | gessha wrote:
             | Behavior outside of the training distribution is undefined
             | and more often than not desirable. NNs work well on stuff
             | they're trained on.
        
         | dekhn wrote:
         | I use object detection to track tardigrades in my custom
         | motorized microscope. It's very useful for making long
         | observations in a field much larger than the scope's field of
         | view.
         | 
         | The system works quite simply: I start with an existing object
         | detector and train it with a small (<100) number of manually
         | labelled images. Then during inference, I move the scope's
         | field of view using motor commands to put the center of the
         | tardigrade at the center of the field of view.
         | 
         | This technology is very useful for doing long-term observations
         | of tardigrades (so, useful for science).
        
           | eurekin wrote:
           | Thank you! Is the detection accurate enough, or simply the
           | observation conclusions are not that sensitive to minor
           | errors?
           | 
           | That makes me want to revisit my previous idea: boiling soup
           | spillage detector. I once had a google meeting with a cooking
           | soup to keep an eye on it and thought, heck, that seems like
           | a nice exercise for a visual detector finetune
        
             | dekhn wrote:
             | The detection was accurate enough for me to complete one
             | prototype experiment under controlled conditions- a single
             | tardigrade in an otherwise empty field, and even then, it
             | did lose the tardigrade once or twice. Different lighting
             | conditions, and other things in the field like tardigrade
             | eggs, algae and dirt all make it more challenging.
             | 
             | To make it truly ready for production science, I'd need to
             | put more work into making the model robust. I'd also like
             | better object tracking, so I could track multiple unique
             | tardigrades.
             | 
             | If you want to see even better examples, take a look at
             | DeepLabCut, https://www.mackenziemathislab.org/deeplabcut
             | especially the video examples.
        
         | VTimofeenko wrote:
         | Frigate uses models like this one for NVR:
         | 
         | https://frigate.video/
        
         | yazzku wrote:
         | To surveil people in the streets.
        
           | eurekin wrote:
           | Yeah, that's what I'm afraid
        
         | daemonologist wrote:
         | I'm working on a project that detects climbing holds and lets
         | you set routes by selecting them. (The usual method is putting
         | a bit of colored tape on each hold, where the color corresponds
         | to a route. This works great but becomes difficult to read once
         | more than four or five routes share a hold.) YOLO made the
         | computer vision part of this pretty smooth sailing.
        
           | adolph wrote:
           | Embed the holds with a led and ir sensor a la swift concert
           | [0] and you've got the whole package.
           | 
           | 0. https://news.ycombinator.com/item?id=40492515
        
         | sachin9 wrote:
         | I've been very persistent over the past few months in
         | developing a system for agriculture as a primary use case. I
         | want to deploy features to classify crop type, height,
         | vegetation stage, and other important metrics to achieve real-
         | time or near real-time analytics.
         | 
         | Do you have any suggestions on how to proceed further? So far,
         | I've procured a Jetson, five cameras, a stand to fix and
         | calibrate the modules, and a cam array hat to equip four
         | cameras and the jetson. I was checking out VPU and NPUS and
         | other hardware as well but struggling to identify compatible
         | hardware. How can I get ahead and build such model to test and
         | validate in 3 Months of time ?
        
         | nickpsecurity wrote:
         | Field mice that I thought were moles have destroyed my yard.
         | There's so many tunnels that I can't tell which are most
         | active. A camera AI that can show which parts of the ground
         | changed significantly would be nice.
         | 
         | At a hotel, we had a problem of luggage carts going missing.
         | There's a few ways to deal with that. A generic one that would
         | support other, use cases would be to let the camera tell you
         | the last room it went in. Likewise, outdoor cameras might tell
         | you which vehicles had a customer walk in the hotel and which
         | might be non-guests.
        
         | zerojames wrote:
         | Great question! I work for a computer vision company (Roboflow)
         | and have seen computer vision used for everything from accident
         | prevention on critical infrastructure to identifying defects on
         | vehicle parts to detecting trading cards for use in video game
         | applications.
         | 
         | Drawing bounding boxes is a common end point for demos, but for
         | businesses using computer vision there is an entire world after
         | that: on device deployment. This can be on devices like an
         | NVIDIA Jetson (a very common choice), to Raspberry Pis to
         | central CUDA GPU servers for processing large volumes of data
         | (maybe connected to cameras over RTSP).
         | 
         | Note: There are many models that are faster and perform better
         | than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow
         | Inference that our ML team maintains has various guides on
         | deploying models to the edge:
         | https://inference.roboflow.com/#inference-pipeline
        
           | alandarev wrote:
           | Can you go into some examples?
        
       ___________________________________________________________________
       (page generated 2024-05-31 23:01 UTC)