[HN Gopher] Real-time audio programming 101: time waits for nothing
___________________________________________________________________
Real-time audio programming 101: time waits for nothing
Author : ssfrr
Score : 73 points
Date : 2024-07-09 03:24 UTC (1 days ago)
(HTM) web link (www.rossbencina.com)
(TXT) w3m dump (www.rossbencina.com)
| swatcoder wrote:
| (2011) But a great summary and mostly evergreen
|
| One practical reality it doesn't share is that your audio
| processing (or generation) code is often going to be running in a
| bus shared by a ton of other modules and so you don't have the
| luxury of using "5.6ms" as your deadline for a 5.6ms buffer. Your
| responsibility, often, is to just get as performant as reasonably
| possible so that _everything_ on the bus can be processed in
| those 5.6ms. The pressure is usually much higher than the buffer
| length suggests.
| spacechild1 wrote:
| What do you mean by "bus" and "module" in this context?
| GrantMoyer wrote:
| A module is a piece of software or hardware which is
| independent in some way.
|
| A bus is a shared medium of communication[1]. Often, busses
| are time-division multiplexed[2], so if you want to use the
| bus, but another module is already using it, you need to
| wait.
|
| For example, if your audio buffers are ultimately submitted
| to a sound card over a PCI bus, the submission may need to
| wait for any ongoing transactions on the PCI bus, such as
| messages to a graphics card.
|
| [1]: https://en.wikipedia.org/wiki/Bus_(computing)
|
| [2]: https://en.wikipedia.org/wiki/Time-division_multiplexing
| spacechild1 wrote:
| That is one possible interpretation, but not what they
| meant. That's why I asked because I wasn't sure :)
| swatcoder wrote:
| "Bus" (as I was using it) is the path from some audio source
| to some audio destination and a "module" (as used) would be
| something that takes a buffer of samples on that bus and does
| something with it.
|
| You might _sometimes_ build an app where (through your
| operating system) you connect directly with an input device
| and /or output device and then do all the audio processing
| yourself. In this case, you'd more or less control the whole
| bus and all the code processing samples on it and have a
| _fairly_ true sense of your deadline. (The OS and drivers
| would still be introducing some overhead for mixing or
| resampling, etc, but that 's generally of small concern and
| hard to avoid)
|
| Often, though, you're either going to be building a bus and
| applying your own effects _and some others_ (from your OS,
| from team members, from third party plugins /libraries, etc)
| or you're going to be writing some kind of effect/generator
| that gets inserted into somebody else's bus in something like
| a DAW or game. In all these cases, you need to assume that
| all processing code that _isn 't_ yours needs all the time
| that you can leave for it and just make your own code as
| efficient as is reasonable.
| spacechild1 wrote:
| Thanks for clarifying. The terms are highly ambigious (see
| the sibling answer
| https://news.ycombinator.com/item?id=40930298), that's why
| I asked. Personally, I would rather use the terms "audio
| pipeline" or "audio graph" instead of the generic "bus".
|
| > In all these cases, you need to assume that all
| processing code that isn't yours needs all the time that
| you can leave for it and just make your own code as
| efficient as is reasonable.
|
| Yes. For audio programmers that is obvious, in particular
| when it comes to plugins, but for novices it might be worth
| pointing out!
| RossBencina wrote:
| > You might sometimes build an app where (through your
| operating system) you connect directly with an input device
| and/or output device and then do all the audio processing
| yourself.
|
| In case it is not clear, that is the primary case that is
| addressed by the linked blog post (source: I wrote the blog
| post).
| swatcoder wrote:
| And likewise: in case it wasn't clear, it's a great
| article! I wasn't meaning to criticize it, just add a
| little further perspective for the common scenario that
| many first-time audio programming folks encounter.
| ssfrr wrote:
| Definitely good to keep in mind. The thing that I think is
| really interesting about audio programming is that you need to
| be _deterministically_ fast. If your DSP callback executes in
| 1ms 99.99% of the time but sometimes takes 10ms, you're hosed.
|
| I would love to see a modern take on the real-world risk of
| various operations that are technically nondeterministic. I
| wouldn't be surprised if there are cases where the risk of >1ms
| latency is like 1e-30, and dogmatically following this advice
| might be overkill.
| varispeed wrote:
| The real fun is optimising maths. Remove all divisions.
| Create LUTs, approximations, CPU specific tricks. Despite the
| fact CPUs are magnitudes faster now, they are still slow for
| real time processing.
| kdjdjjz wrote:
| Real time does not mean fast, it means deterministic
|
| Thus such micro optimizations are seldomly used. Quite the
| opposite, you try to avoid jitter which could be the result
| of caches
| RossBencina wrote:
| While real-time does not mean fast, micro optimisations
| are frequently used. No one likes slow DSP audio
| software.
| rzzzt wrote:
| Use of Ethernet in real-time systems. Packet loss, collision
| rate, jitter is """good enough""" so it became an acceptable
| replacement of eg. ATM.
| lukeh wrote:
| Or you use AVB/TSN which gives you stronger guarantees, but
| requires cooperation of all bridges (switches).
| RossBencina wrote:
| > deterministically fast
|
| Indeed, like all real-time systems you need to think in terms
| of worst-case time complexity, not amortized complexity.
| jancsika wrote:
| > If your DSP callback executes in 1ms 99.99% of the time but
| sometimes takes 10ms, you're hosed.
|
| I tend to agree, but...
|
| From my recollection of using Zoom-- it has this bizarre but
| workable recovery method for network interruptions. Either
| the server or the client keeps some amount of the last input
| audio in a buffer. Then if the server detects connection
| problems at time 't', it grabs the buffer from t - 1 seconds
| all the way until the server detects better connectivity.
| Then it starts a race condition, playing back that amount of
| the buffer to all clients at something like 1.5 speed. From
| what I remember, this algo typically wins the race and saves
| the client from having to repeat themselves.
|
| That's not happening inside a DSP routine. But my point is
| that some clever engineer(s) at Zoom realized that missing
| deadlines in audio delivery does not _necessarily_ mean
| "hosed." I'm also going to rankly speculate that every other
| video conferencing tool hard-coupled missing deadlines with
| "hosed," and that's why Zoom is the only one where I've ever
| experienced the benefit of that feature.
| ssfrr wrote:
| The context for this article is writing pro audio software,
| where that kind of distortion would generally be as bad as
| a dropout, if not worse.
| RossBencina wrote:
| > dogmatically following this advice might be overkill
|
| It depends on your appetite for risk and the cost of failure.
|
| A big part of the problem is that general purpose computing
| systems (operating systems and hardware) are not engineered
| as real-time systems and there are rarely vendor guarantees
| with respect to real-time behavior. Under such circumstances,
| my position is that you need to code defensively. For
| example, if your operating system memory allocator does not
| guarantee a worst-case bound on execution time, do not use it
| in a real-time context.
| RossBencina wrote:
| In the context of the article, I assume that the driver has
| arranged sufficient buffering so that the jitter in scheduling
| across a bus (PCI, USB) is masked with respect to the client
| code. But you are correct that communications overhead can cut
| into your compute time if it is not addressed. Some audio APIs
| (e.g. CoreAudio) allow for configuring the buffering margins,
| so you can trade off buffer latency against available audio
| compute %. There is a whole world of debate surrounding how to
| best schedule audio compute (e.g. interrupt driven vs. delay-
| locked high precision timers).
|
| Assuming the context is a desktop OS (which is the context of
| TFA), I think that the main source of non-determinism is
| scheduling jitter (the time between the ideal start of your
| computation, and the time when the OS gives you the CPU to
| start the computation). Of course if you can't arrange
| exclusive or max-priority access to a CPU core you're also
| going to be competing with other processes. Then there is non-
| deterministic execution time on most modern CPUs due to cache
| timing effects, superscalar out of order instruction
| scheduling, inter-core synchronisation, and so on. So yeah,
| you're going to need some margin unless you're on dedicated
| hardware with deterministic compute (e.g. a DSP chip).
| swatcoder wrote:
| No, I'm just talking about the common case where you have
| some other stuff going on before or after your own audio
| processing code: a software instrument your framework
| provides, some AudioUnits or gstreamer nodes adding other
| effects, the whole device chain in the DAW that's hosting
| you, etc. _All_ of those things need to get done within your
| window so you can 't use the whole thing for yourself.
|
| Most people learning audio programming aren't making a
| standalone audio app where they do all the processing, or at
| least not an interesting one. They're usually either making
| something like a plugin that ends up in somebody else's
| bus/graph, or something like a game or application that
| creates a bus/graph and shoves a bunch of different stuff
| into it.
| user_7832 wrote:
| Slightly tangential, does anyone know any good (windows based)
| DSP software? EquilizerAPO is decent in theory but beyond being
| clunky to use unfortunately doesn't even seem to work 90% of the
| time.
| bratwurst3000 wrote:
| I think camilla dsp works for windows.
| chresko wrote:
| SuperCollider
| spacechild1 wrote:
| Graphical: Pure Data, Max/MSP
|
| Text based: SuperCollider, Csound, Chuck
| Ylpertnodi wrote:
| https://www.airwindows.com/consolidated/
| rzzzt wrote:
| Cockos' JSFX: https://www.cockos.com/jsfx/
| RossBencina wrote:
| AudioMulch?
| spacechild1 wrote:
| A timeless classic! This is the first thing I always recommend to
| anyone interested in real-time audio programming.
| chaosprint wrote:
| Great resource! For those interested in learning the fundamentals
| of audio programming, I highly recommend starting with Rust.
|
| the cpal library in Rust is excellent for developing cross-
| platform desktop applications. I'm currently maintaining this
| library:
|
| https://github.com/chaosprint/asak
|
| It's a cross-platform audio recording/playback CLI tool with TUI.
| The source code is very simple to read. PRs are welcomed and I
| really hope Linux users can help to test and review new PRs :)
|
| When developing Glicol(https://glicol.org), I documented my
| experience of "fighting" with real-time audio in the browser in
| this paper:
|
| https://webaudioconf.com/_data/papers/pdf/2021/2021_8.pdf
|
| Throughout the process, Paul Adenot's work was immensely helpful.
| I highly recommend his blog:
|
| https://blog.paul.cx/post/profiling-firefox-real-time-media-...
|
| I am currently writing a wasm audio module system, and hope to
| publish it here soon.
| nyanpasu64 wrote:
| Is it still the case that cpal doesn't support "synchronous"
| duplex audio where the program inputs audio from a source and
| outputs it to a sink (either with feedback or outputting
| unrelated audio), with an integer number of periods (as little
| as 2) of software-level latency if you copy source buffers to
| the sink? Last time I used it, each stream is opened in input
| or output mode and opening both does not run with any
| guaranteed timing relation.
| marcod wrote:
| Off topic. Anybody else like Thursday Next? Had to think of "Time
| waits for no man!"
| brcmthrowaway wrote:
| This seems super outdated. Isn't CoreAudio HW accelerated now?
| jmkr wrote:
| As a web developer, learning music and audio programming makes my
| mind melt. We often say "real time" when we mean "fast." But in
| audio real time means "really fast, all the time" and somewhat
| deterministically.
|
| If your tempo drifts, then you're not going to hear the rhythm
| correctly. If you have a bit of latency on your instrument, it's
| like turning on a delay pedal where the only signal coming
| through is the delay.
|
| One might assume if you just follow audio programming guides then
| you can do all this, but you still need to have your system setup
| to handle real time audio, in addition to your program.
|
| It's all noticeable.
___________________________________________________________________
(page generated 2024-07-10 23:00 UTC)