[HN Gopher] Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant
___________________________________________________________________
Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant
Pi-card is an AI powered voice assistant running locally on a
Raspberry Pi. It is capable of doing anything a standard LLM (like
ChatGPT) can do in a conversational setting. In addition, if there
is a camera equipped, you can also ask Pi-card to take a photo,
describe what it sees, and then ask questions about that. It uses
distributed models so latency is something I'm working on, but I am
curious on where this could go, if anywhere. Very much a WIP.
Feedback welcome :-)
Author : nkaz123
Score : 126 points
Date : 2024-05-13 19:03 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| harwoodr wrote:
| I see that a speaker is in the hardware list - does this speak
| back?
| nkaz123 wrote:
| Yes! I'm currently using https://espeak.sourceforge.net/, so it
| isn't especially fun to listen to though.
|
| Additionally, since I'm streaming the LLM response, it won't
| take long to get your reply. Since it does it a chunk at a
| time, there's occasionally only parts of words that are said
| momentarily. Also of course depends on what model you use or
| what the context size is for how long you need to wait.
| eddieroger wrote:
| > Why Pi-card? > Raspberry Pi - Camera Audio Recognition Device.
|
| Missed opportunity for LCARS - LLM Camera Audio Recognition
| Service, responding to the keyword "computer," naturally. I guess
| if this ran elsewhere from a Pi, it could be LCARS.
| rkagerer wrote:
| Pi-C.A.R.D is perfect. Read it 100% as Picard, and more
| recognizable that LCARS.
| orthecreedence wrote:
| Just configure it to respond to "Computer" and you're good to
| go.
| nkaz123 wrote:
| The wake word detection is an interesting problem here. As
| you can see in the repo, I have a lot of mis-heard versions
| of the wake word in place, in this case being "Raspberry".
| Since the system heats up fast you need a fan, and with the
| microphone directly on a USB port next to the fan, I needed
| something distinct, and computer wasn't cutting it for
| this.
|
| Changing the transcription model to something a bit better
| or moving the mic away from the fan could help this happen.
| thesnide wrote:
| "Number One" would be my code word...
| pimeys wrote:
| And finally saying "make it so" to make the command
| happen.
| bdcravens wrote:
| Or LLM Offline Camera, User Trained Understanding Speech
|
| LOCUTUS
| MisterTea wrote:
| This is why we can't have nice LCARS things:
| https://en.wikipedia.org/wiki/LCARS#Legal
| layer8 wrote:
| It should be really something like Beneficial Audio Realtime
| Recognition Electronic Transformer.
| rkagerer wrote:
| _I wanted to create a voice assistant that is completely offline
| and doesn 't require any internet connection. This is because I
| wanted to ensure that the user's privacy is protected and that
| the user's data is not being sent to any third party servers._
|
| Props, and thank you for this.
| ornornor wrote:
| Ditto!
| pyaamb wrote:
| I would love for Apple/Google to introduce some tech that would
| make it provable/verifiable that the camera/mic on the device
| can only be captured when the indicator is on and that it isn't
| possible for apps or even higher layers of OS to spoof this
| herval wrote:
| That's allegedly the case in iOS (not the provable part, but
| I wonder if anyone managed to disprove it yet?)
| pyaamb wrote:
| I'm thinking perhaps a standardized open design circuit
| that can you can view by opening up back cover and zooming
| in with a microscope.
|
| feel like privacy tech like this that seemed wildly
| overkill for everyday users becomes necessary as the value
| of collecting data and profiling humans goes through the
| roof
| abraae wrote:
| Red nail polish would like a word
| pyaamb wrote:
| I missed the reference. Red nail polish?
| cmcconomy wrote:
| Funny, I just picked up a device for use with
| https://heywillow.io for similar reasons
| knodi123 wrote:
| me too, but I bricked mine when flashing the bios. just a
| fluke, nothing to be done about it.
| aci_12 wrote:
| how does wake word work? Does it keep listening and ignore if the
| last few seconds does not have the wake word/phrase?
| knodi123 wrote:
| that's the general idea, yes. Or rather, store several chunks
| of audio, and discard the oldest. aka "Rolling window".
| dasl wrote:
| What latency do you get? I'd be interested in seeing a demo
| video.
| nkaz123 wrote:
| Fully depends on the model, how much conversational context you
| provide, but if you keep things to a bare minimum, ~< 5 seconds
| from message received to starting the response using Llama 3
| 8B. I'm also using a vision language model,
| https://moondream.ai/, but that takes around 45 seconds so the
| next idea is to take a more basic image captioning model and
| insert it's output into context and try to cut that time down
| even more.
|
| I also tried using Vulkan, which is supposedly faster, but the
| times were a bit slower than normal CPU for Llama CPP.
| pawelduda wrote:
| All I need is a voice assistant that: - RPi 4 can handle, - I can
| integrate with HomeAssistant, - is offline only, and doesn't send
| my data anywhere.
|
| This project seems to be ticking most, if not all of the boxes,
| compared to anything else I've seen. Good job!
|
| While at it, can someone drop a recommendation for a Rpi-
| compatible mic for Alexa-like usecase?
| 8xeh wrote:
| https://www.robotshop.com/products/respeaker-usb-microphone-...
| baobun wrote:
| Check out Rhasspy.
|
| You won't get anything practically useful running LLMs on the
| 4B but you also don't strictly need LLM-based models.
|
| In the Rhasspy community, a common pattern is to do (cheap and
| lightweight) wake-word detection locally on mic-attached
| satellites (here 4B should be sufficient) and then stream the
| actual recording (more computational resources for better
| results) over the local network to a central hub.
| yangikan wrote:
| I have got good results with Playstation 3 and Playstation 4
| cameras (which also have a mic). They are available for $15-20
| on ebay.
| knodi123 wrote:
| Is it possible to run this on a generic linux box? Or if not, are
| you aware of a similar project that can?
|
| I've googled it before, but the space is crowded and the caveats
| are subtle.
___________________________________________________________________
(page generated 2024-05-13 23:00 UTC)