[HN Gopher] Lessons learnt building a real-time audio applicatio...
___________________________________________________________________
Lessons learnt building a real-time audio application in Python
Author : spmvg
Score : 44 points
Date : 2024-09-09 14:18 UTC (8 hours ago)
(HTM) web link (www.vangemert.dev)
(TXT) w3m dump (www.vangemert.dev)
| camtarn wrote:
| "It turns out that the round-trip time from an audio interface,
| through a computer (DAW) and back to the speakers takes a few
| hundreds of milliseconds, making direct audio processing
| impossible using consumer hardware." - uh, what? Real-time audio
| processing has been a thing for at least a couple of decades. It
| doesn't work by default on Windows, but you can get free drivers
| (ASIO4All) which make it work on pretty much any hardware. And it
| works out of the box on Macs.
|
| "Latency seems to shift by a few tens of milliseconds when
| restarting the application." - this makes me think you are using
| the wrong API for your sound input/output. With modern realtime
| audio support, your total latency from input to output should be
| less than 10ms _total_.
|
| "I expected that memory usage would get out of hand quite fast
| due to the ever growing dictionary of arrays containing audio
| data, but this does not happen in practice. I suspect that the
| good performance is caused by highly optimized memory management
| of Python and modern OSes." - without concrete figures it's quite
| hard to evaluate this, but what did you expect to happen? With a
| 44.1KHz stereo audio stream, you should be storing 88.2 thousand
| samples a second. Say you're using 64-bit floats, as a worst
| case. Your audio storage should be growing at about 689KB/sec,
| plus a bit extra for object overhead. How much is it actually
| growing by? Of course Python is probably doing a bunch of
| allocation and deallocation for temporary objects behind the
| scenes, but hopefully you should not need to lean too hard on
| 'highly optimized memory management' - ideally, you should hardly
| be allocating anything at all. Also, why a dict, rather than just
| a large array that you can occasionally make bigger?
|
| Finally ... I'm sure you already know that Python is possibly the
| worst mainstream language you could pick for realtime audio
| processing. But that is fine. I have tried to build audio stuff
| in Python too! Sometimes using the wrong tool for the job is part
| of the fun.
| PhunkyPhil wrote:
| +1. In Ableton on Windows you can get your latency down to
| ~40ms without a dedicated sound card using ASIO. Mac's drivers
| are even better with sub ~20 ms on my m2 pro IIRC.
| shannonclaude wrote:
| +1 to the comments here. Part of the issue here is running
| these applications in Python. It's not really optimized to
| handle these loads and do DSP-based compute efficiently.
| jancsika wrote:
| > Mac's drivers are even better with sub ~20 ms on my m2 pro
| IIRC.
|
| Just to be clear that you're measuring apples to apples with
| OP:
|
| You are measuring less than 40ms _roundtrip latency_ on your
| Mac. Is this correct?
| spmvg wrote:
| Interesting comment! I'm going to figure out if using another
| driver allows me to get under 20 ms in latency. Right now I'm
| measuring around 300 ms in latency round-trip, which is not a
| problem because I can correct for it. (I'm using a Focusrite
| Scarlett 2i2 with default drivers.)
|
| The reasoning behind my comment about round-trip time was as
| follows: - Right now I'm measuring around 300
| ms round-trip time, without processing inbetween - In the
| past I've tried to do live effects in Ableton with ASIO drivers
| (guitar in -> Ableton effects -> out), and the delay was too
| noticable. I couldn't play that way without making my ears
| bleed and I've switched back to pedals since.
|
| One follow up: how could I achieve a total round-trip latency
| of around 10 ms total, as you describe? If I use a buffer of
| 500 samples @ 44.1 kHz, then I am spending already 11 ms just
| filling the buffer. So then the buffers need to become really
| small, causing more processing overhead, right? Not sure if
| this is the way to go.
| camtarn wrote:
| Yeah, your Scarlett should be capable of single-digit ms
| latency. If you're on Windows, you need to install its ASIO
| drivers and figure out how to use them from Python. Then,
| yes, use tiny buffers and run your audio processing very fast
| - which is where Python's slowness will probably become a
| real problem.
|
| 10ms latency is how long sound takes to travel 3-and-a-bit
| metres. So if your amp is a few metres from you, you would
| experience that delay between hitting the guitar strings and
| hearing the amplified sound. This should barely be
| noticeable. If you were noticing a delay greater than that in
| your Ableton effects setup, your settings needed tweaked. All
| of this is completely possible - I had a PC-based electronic
| drum setup in 2006, running through the Reason DAW, which had
| 8ms latency between hitting a pad and hearing the result.
|
| Hmm, I wonder if Cython (static Python-to-C compiler) would
| make writing audio code easier/more possible?
| spmvg wrote:
| With Ableton and the default ASIO configuration on my
| Scarlett I get 96 ms combined input+output latency without
| any processing in between, so that's probably what made my
| ears bleed before. Tweaking the sample rate and buffer size
| gets me indeed single digit latencies in Ableton. So I'm
| definitely going to adjust the section about latency,
| thanks for this!
|
| I'm a bit on the fence about what this means for the
| difficult latency calibration routine in the application.
| Ideally I could throw the calibration routine away, but
| then I require that users have ASIO installed, while the
| app now also works with non-ASIO drivers. And indeed Python
| itself might become a bottleneck (making this work in
| Python has been half the fun).
| sim7c00 wrote:
| try clarett interface. it also comes with pre amps which
| will make your sound less noisy , scarlet preamps are
| just absolutely terrible. you can debug your daw to see
| how it uses drivers and make a python module which
| exposes similar functions to python. you will likely
| still want a delay compensation to make things seem free
| of any latency, but it will be doing _much_ less
| compensating. maybe theres an opensource daw if you want
| to skip reversing driver calls from a debugger.
| spmvg wrote:
| Debugging an existing DAW to see how they do it under the
| hood is an interesting idea. Haven't done that yet.
|
| About another interface: I do want to keep the
| application supporting cheaper interfaces such as the
| Scarlett, because the target audience (hobby musicians)
| will be using those. Still would be a nice upgrade for
| me!
| kibibu wrote:
| Can take a peek at how Tracktion engine does it too
| bongodongobob wrote:
| I would disable any services and programs running in the
| background as well. Years ago I disabled the Windows
| print spooler and it greatly improved jitter. Not sure if
| that's still the case these days though, that was
| probably 10 years ago.
| spmvg wrote:
| So far CPU usage hasn't been an issue at all (<1% usually
| on my not-very-impressive laptop), which surprised me as
| well
| dist-epoch wrote:
| Even without ASIO you should be able to hit 40 ms latency
| on pretty much any Windows audio hardware, including
| motherboard built-in.
|
| If you get 300 ms you're doing something wrong. Note that
| Windows has multiple audio APIs, 300 ms is about the
| latency of the old MME api, you need to use the newer
| one, WASAPI.
| spmvg wrote:
| I apparently only have the old Windows MME drivers indeed
| (and ASIO, on Win10). Need to look into why I can't find
| WASAPI and if I can assume other Windows users have those
| by default.
| michaelrmmiller wrote:
| WASAPI has been available since Windows Vista. It isn't
| its own set of drivers but rather a unifying layer for
| the WDM driver and the preceding mishmash of Windows
| audio APIs (MME, DirectAudio, etc). WASAPI supports low
| ish latencies with Exclusive Mode and then something like
| 10ms buffering in Shared Mode through the Windows audio
| server, I recall.
|
| Put another way: any Windows audio device supports WASAPI
| unless it only ships with an ASIO driver which is
| unlikely, even in the pro audio space.
| ubercore wrote:
| I don't know windows audio, but on mac audio that's
| wildly high latency for a scarlett interface.
| nicholasjarnold wrote:
| I was once bitten by not understanding that there is a difference
| between "regular" clocks and high performance clocks/timers that
| a developer can take advantage of. At the time I needed a
| sampling routine to run at precisely once per second. My
| inexperience led me to go with something like thread.sleep(1000),
| and I learned quickly that I was mistaken in thinking it'd run
| with little jitter. As others are pointing out, there are also
| similar lessons and solutions when dealing with audio processing
| pipelines.
| spmvg wrote:
| Indeed, it is not a guarantee that the "sleep" will be exactly
| that long. In the code I'm not "sleeping" in any sensitive
| places, instead I'm relying on the callback to the audio stream
| object, which just needs to finish before the next one starts
| (less of a timing constraint).
___________________________________________________________________
(page generated 2024-09-09 23:01 UTC)