[HN Gopher] Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant
       ___________________________________________________________________
        
       Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant
        
       Pi-card is an AI powered voice assistant running locally on a
       Raspberry Pi. It is capable of doing anything a standard LLM (like
       ChatGPT) can do in a conversational setting. In addition, if there
       is a camera equipped, you can also ask Pi-card to take a photo,
       describe what it sees, and then ask questions about that.  It uses
       distributed models so latency is something I'm working on, but I am
       curious on where this could go, if anywhere.  Very much a WIP.
       Feedback welcome :-)
        
       Author : nkaz123
       Score  : 126 points
       Date   : 2024-05-13 19:03 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | harwoodr wrote:
       | I see that a speaker is in the hardware list - does this speak
       | back?
        
         | nkaz123 wrote:
         | Yes! I'm currently using https://espeak.sourceforge.net/, so it
         | isn't especially fun to listen to though.
         | 
         | Additionally, since I'm streaming the LLM response, it won't
         | take long to get your reply. Since it does it a chunk at a
         | time, there's occasionally only parts of words that are said
         | momentarily. Also of course depends on what model you use or
         | what the context size is for how long you need to wait.
        
       | eddieroger wrote:
       | > Why Pi-card? > Raspberry Pi - Camera Audio Recognition Device.
       | 
       | Missed opportunity for LCARS - LLM Camera Audio Recognition
       | Service, responding to the keyword "computer," naturally. I guess
       | if this ran elsewhere from a Pi, it could be LCARS.
        
         | rkagerer wrote:
         | Pi-C.A.R.D is perfect. Read it 100% as Picard, and more
         | recognizable that LCARS.
        
           | orthecreedence wrote:
           | Just configure it to respond to "Computer" and you're good to
           | go.
        
             | nkaz123 wrote:
             | The wake word detection is an interesting problem here. As
             | you can see in the repo, I have a lot of mis-heard versions
             | of the wake word in place, in this case being "Raspberry".
             | Since the system heats up fast you need a fan, and with the
             | microphone directly on a USB port next to the fan, I needed
             | something distinct, and computer wasn't cutting it for
             | this.
             | 
             | Changing the transcription model to something a bit better
             | or moving the mic away from the fan could help this happen.
        
             | thesnide wrote:
             | "Number One" would be my code word...
        
               | pimeys wrote:
               | And finally saying "make it so" to make the command
               | happen.
        
         | bdcravens wrote:
         | Or LLM Offline Camera, User Trained Understanding Speech
         | 
         | LOCUTUS
        
         | MisterTea wrote:
         | This is why we can't have nice LCARS things:
         | https://en.wikipedia.org/wiki/LCARS#Legal
        
         | layer8 wrote:
         | It should be really something like Beneficial Audio Realtime
         | Recognition Electronic Transformer.
        
       | rkagerer wrote:
       | _I wanted to create a voice assistant that is completely offline
       | and doesn 't require any internet connection. This is because I
       | wanted to ensure that the user's privacy is protected and that
       | the user's data is not being sent to any third party servers._
       | 
       | Props, and thank you for this.
        
         | ornornor wrote:
         | Ditto!
        
         | pyaamb wrote:
         | I would love for Apple/Google to introduce some tech that would
         | make it provable/verifiable that the camera/mic on the device
         | can only be captured when the indicator is on and that it isn't
         | possible for apps or even higher layers of OS to spoof this
        
           | herval wrote:
           | That's allegedly the case in iOS (not the provable part, but
           | I wonder if anyone managed to disprove it yet?)
        
             | pyaamb wrote:
             | I'm thinking perhaps a standardized open design circuit
             | that can you can view by opening up back cover and zooming
             | in with a microscope.
             | 
             | feel like privacy tech like this that seemed wildly
             | overkill for everyday users becomes necessary as the value
             | of collecting data and profiling humans goes through the
             | roof
        
           | abraae wrote:
           | Red nail polish would like a word
        
             | pyaamb wrote:
             | I missed the reference. Red nail polish?
        
       | cmcconomy wrote:
       | Funny, I just picked up a device for use with
       | https://heywillow.io for similar reasons
        
         | knodi123 wrote:
         | me too, but I bricked mine when flashing the bios. just a
         | fluke, nothing to be done about it.
        
       | aci_12 wrote:
       | how does wake word work? Does it keep listening and ignore if the
       | last few seconds does not have the wake word/phrase?
        
         | knodi123 wrote:
         | that's the general idea, yes. Or rather, store several chunks
         | of audio, and discard the oldest. aka "Rolling window".
        
       | dasl wrote:
       | What latency do you get? I'd be interested in seeing a demo
       | video.
        
         | nkaz123 wrote:
         | Fully depends on the model, how much conversational context you
         | provide, but if you keep things to a bare minimum, ~< 5 seconds
         | from message received to starting the response using Llama 3
         | 8B. I'm also using a vision language model,
         | https://moondream.ai/, but that takes around 45 seconds so the
         | next idea is to take a more basic image captioning model and
         | insert it's output into context and try to cut that time down
         | even more.
         | 
         | I also tried using Vulkan, which is supposedly faster, but the
         | times were a bit slower than normal CPU for Llama CPP.
        
       | pawelduda wrote:
       | All I need is a voice assistant that: - RPi 4 can handle, - I can
       | integrate with HomeAssistant, - is offline only, and doesn't send
       | my data anywhere.
       | 
       | This project seems to be ticking most, if not all of the boxes,
       | compared to anything else I've seen. Good job!
       | 
       | While at it, can someone drop a recommendation for a Rpi-
       | compatible mic for Alexa-like usecase?
        
         | 8xeh wrote:
         | https://www.robotshop.com/products/respeaker-usb-microphone-...
        
         | baobun wrote:
         | Check out Rhasspy.
         | 
         | You won't get anything practically useful running LLMs on the
         | 4B but you also don't strictly need LLM-based models.
         | 
         | In the Rhasspy community, a common pattern is to do (cheap and
         | lightweight) wake-word detection locally on mic-attached
         | satellites (here 4B should be sufficient) and then stream the
         | actual recording (more computational resources for better
         | results) over the local network to a central hub.
        
         | yangikan wrote:
         | I have got good results with Playstation 3 and Playstation 4
         | cameras (which also have a mic). They are available for $15-20
         | on ebay.
        
       | knodi123 wrote:
       | Is it possible to run this on a generic linux box? Or if not, are
       | you aware of a similar project that can?
       | 
       | I've googled it before, but the space is crowded and the caveats
       | are subtle.
        
       ___________________________________________________________________
       (page generated 2024-05-13 23:00 UTC)