https://navat.substack.com/p/diy-acoustic-camera-using-uma-16

Mike's Newsletter
Subscribe

  * About
  * Archive
  * Help
  * Log in

Share this post
DIY Acoustic Camera using UMA-16
navat.substack.com
Copy link
Twitter
Facebook
Email

DIY Acoustic Camera using UMA-16

Make your own acoustic camera with miniDSP UMA-16 and Acoular

        Michael Navat
[https] Apr 16    13         
               4

Acoustic cameras are used for locating sources of sounds. You can
find online quite a few projects and products of acoustic cameras.
However, they are all either complex or expensive (or both). So, I
decided to build a simpler acoustic camera. And here's how.

[https]The located sound source - a 1kHz beeping portable speaker

If you want me to write more on the theory of acoustic camera and
beamforming for sound localization, please leave a comment saying so
and subscribe for updates.

Leave a comment

[                    ]Subscribe
---------------------------------------------------------------------

Requirements

  * miniDSP UMA-16 microphone array + USB camera (~275 USD)

  * Tripod

  * Python + Acoular

---------------------------------------------------------------------

Process

This project consists of 4 parts:

 1. Capture video and 16 channels audio

 2. Beamform the sound data into sound pressure level

 3. Visualize the sound pressure level

 4. Merge the video with the audio

Let's get to work.

---------------------------------------------------------------------

Capture

[https]miniDSP UMA-16 with a USB camera at the center of the
microphone array

Audio

Surprisingly, during the time of this project, it was actually
difficult to find software that captures a 16-channels. I tried
Audacity and some CLI tools with no success. However, as with most
challenges, a bit of Python code did the trick.

Video

Since I'm using Mac, I started with the built-in QuickTime Player to
capture video. However, I quickly realized it wasn't easy to
synchronize the audio and video capture. Again, my solution was some
Python code with OpenCV and PyAudio.

My solution

You can use my code from this git repo. The recording command is:

./record.sh

The outputs of this step are:

video.avi audio.wav

---------------------------------------------------------------------

Beamform

Given the audio capture, we seek to compute the sound pressure level
in each direction. We do that using beamforming.

There are actually a few different algorithms of beamform. I just
used the base beamformer of Acoular, which is a Python package for
beamforming. You need to install this Python package before you can
use it.

Acoular uses the h5 format, so you first need to convert the wav file
to h5 file.

All code samples below should be executed in Jupyter notebook to
render the output image.

Python code

from os import path
import acoular
from pylab import figure, plot, axis, imshow, colorbar, show

micgeofile = path.join(path.split(acoular.__file__)[0], 'xml', 'minidsp_uma16.xml')
datafile = 'audio.h5'

mg = acoular.MicGeom( from_file=micgeofile )
ts = acoular.TimeSamples( name=datafile )
ps = acoular.PowerSpectra( time_data=ts, block_size=128, window='Hanning' )
rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 )
st = acoular.SteeringVector( grid = rg, mics=mg )
bb = acoular.BeamformerBase( freq_data=ps, steer=st )
pm = bb.synthetic( 8000, 3 )
Lm = acoular.L_p( pm )

figure(2, figsize=(5,5))
plot(mg.mpos[0], mg.mpos[1],'o')
axis('equal')
show()

This code should plot the location of the microphones on the miniDSP
UMA-16:

[https]
---------------------------------------------------------------------

Visualize

First, let's plot the beamform result.

Python code

imshow( Lm.T, origin='lower', vmin=Lm.max()-3, \
       extent=rg.extend(), interpolation='bicubic')
colorbar()

[https]Location of the sound source

Here we see the location of the sound source during the entire audio
capture. The higher the value of the pixel in the graph (yellow), the
higher the sound pressure level is. The area of maximal sound
pressure level is stretched since the sound source was moving during
the record. See the video below.

To correlate the visual and auditory data, we can superimpose the
beamforming data over the video as follow:

---------------------------------------------------------------------

Merge audio and video

You can merge back the audio and video. Although I recommend not to
do so since the visual result contains all the interesting
information. To create a video that contains both video and audio,
you can use this FFmpeg command:

ffmpeg -i video.avi -i audio.wav -c:v copy -c:a aac output.mp4

---------------------------------------------------------------------

Alternatives

The UMA-16 result above was actually my second attempt to build an
acoustic camera. The first attempt was based on ReSpeaker 4
microphone array (~25 USD) with a GCC-PHAT (Generalized
cross-correlation) algorithm:

[https]

The results were surprisingly good. In the following video, my iPhone
is playing white noise, and the red marker points at the location of
maximal sound pressure level:

If you want more information on the ReSpeaker 4 microphone array (~25
USD) with a GCC-PHAT algorithm, or any other further information,
please leave a comment below or write me.

   13          
4
- Previous
[https]
[                    ]
Create your profile

[                    ]
Your name[                    ]Your bio[                    ]
[                    ][ ] Sign up for the newsletter
Save & Post Comment
Only paying subscribers can comment on this post

Already a paying subscriber? Log in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to log in.

        David Gurney
        32 min ago

        Seems like the four-mic array did better, although the sound
[https] source is much closer...

        Expand full comment
         Reply

        someone
        4 hr ago

        1. Is that a heatmap what we see in the single frame?

        2. It's constant size, would it fluctuate if the sound played
        did?

        3. It looks like the array data is rotated clockwise,
        otherwise
[https]
        4. why is it not tracking the source correctly?

        5. Would it detect clicks (short bursts of sound)?

        6 How sensitive is it?

        Thanks for sharing!

        Expand full comment
         Reply

11 more comments...
TopNewCommunityWhat is Mike's Newsletter?About 

No posts

Ready for more?

[                    ]Subscribe
(c) 2021 Michael Navat. See privacy, terms and information collection
notice
 Publish on Substack
Mike's Newsletter is on Substack - the place for independent writing
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts