[HN Gopher] Real-time audio programming 101: time waits for nothing
       ___________________________________________________________________
        
       Real-time audio programming 101: time waits for nothing
        
       Author : ssfrr
       Score  : 73 points
       Date   : 2024-07-09 03:24 UTC (1 days ago)
        
 (HTM) web link (www.rossbencina.com)
 (TXT) w3m dump (www.rossbencina.com)
        
       | swatcoder wrote:
       | (2011) But a great summary and mostly evergreen
       | 
       | One practical reality it doesn't share is that your audio
       | processing (or generation) code is often going to be running in a
       | bus shared by a ton of other modules and so you don't have the
       | luxury of using "5.6ms" as your deadline for a 5.6ms buffer. Your
       | responsibility, often, is to just get as performant as reasonably
       | possible so that _everything_ on the bus can be processed in
       | those 5.6ms. The pressure is usually much higher than the buffer
       | length suggests.
        
         | spacechild1 wrote:
         | What do you mean by "bus" and "module" in this context?
        
           | GrantMoyer wrote:
           | A module is a piece of software or hardware which is
           | independent in some way.
           | 
           | A bus is a shared medium of communication[1]. Often, busses
           | are time-division multiplexed[2], so if you want to use the
           | bus, but another module is already using it, you need to
           | wait.
           | 
           | For example, if your audio buffers are ultimately submitted
           | to a sound card over a PCI bus, the submission may need to
           | wait for any ongoing transactions on the PCI bus, such as
           | messages to a graphics card.
           | 
           | [1]: https://en.wikipedia.org/wiki/Bus_(computing)
           | 
           | [2]: https://en.wikipedia.org/wiki/Time-division_multiplexing
        
             | spacechild1 wrote:
             | That is one possible interpretation, but not what they
             | meant. That's why I asked because I wasn't sure :)
        
           | swatcoder wrote:
           | "Bus" (as I was using it) is the path from some audio source
           | to some audio destination and a "module" (as used) would be
           | something that takes a buffer of samples on that bus and does
           | something with it.
           | 
           | You might _sometimes_ build an app where (through your
           | operating system) you connect directly with an input device
           | and /or output device and then do all the audio processing
           | yourself. In this case, you'd more or less control the whole
           | bus and all the code processing samples on it and have a
           | _fairly_ true sense of your deadline. (The OS and drivers
           | would still be introducing some overhead for mixing or
           | resampling, etc, but that 's generally of small concern and
           | hard to avoid)
           | 
           | Often, though, you're either going to be building a bus and
           | applying your own effects _and some others_ (from your OS,
           | from team members, from third party plugins /libraries, etc)
           | or you're going to be writing some kind of effect/generator
           | that gets inserted into somebody else's bus in something like
           | a DAW or game. In all these cases, you need to assume that
           | all processing code that _isn 't_ yours needs all the time
           | that you can leave for it and just make your own code as
           | efficient as is reasonable.
        
             | spacechild1 wrote:
             | Thanks for clarifying. The terms are highly ambigious (see
             | the sibling answer
             | https://news.ycombinator.com/item?id=40930298), that's why
             | I asked. Personally, I would rather use the terms "audio
             | pipeline" or "audio graph" instead of the generic "bus".
             | 
             | > In all these cases, you need to assume that all
             | processing code that isn't yours needs all the time that
             | you can leave for it and just make your own code as
             | efficient as is reasonable.
             | 
             | Yes. For audio programmers that is obvious, in particular
             | when it comes to plugins, but for novices it might be worth
             | pointing out!
        
             | RossBencina wrote:
             | > You might sometimes build an app where (through your
             | operating system) you connect directly with an input device
             | and/or output device and then do all the audio processing
             | yourself.
             | 
             | In case it is not clear, that is the primary case that is
             | addressed by the linked blog post (source: I wrote the blog
             | post).
        
               | swatcoder wrote:
               | And likewise: in case it wasn't clear, it's a great
               | article! I wasn't meaning to criticize it, just add a
               | little further perspective for the common scenario that
               | many first-time audio programming folks encounter.
        
         | ssfrr wrote:
         | Definitely good to keep in mind. The thing that I think is
         | really interesting about audio programming is that you need to
         | be _deterministically_ fast. If your DSP callback executes in
         | 1ms 99.99% of the time but sometimes takes 10ms, you're hosed.
         | 
         | I would love to see a modern take on the real-world risk of
         | various operations that are technically nondeterministic. I
         | wouldn't be surprised if there are cases where the risk of >1ms
         | latency is like 1e-30, and dogmatically following this advice
         | might be overkill.
        
           | varispeed wrote:
           | The real fun is optimising maths. Remove all divisions.
           | Create LUTs, approximations, CPU specific tricks. Despite the
           | fact CPUs are magnitudes faster now, they are still slow for
           | real time processing.
        
             | kdjdjjz wrote:
             | Real time does not mean fast, it means deterministic
             | 
             | Thus such micro optimizations are seldomly used. Quite the
             | opposite, you try to avoid jitter which could be the result
             | of caches
        
               | RossBencina wrote:
               | While real-time does not mean fast, micro optimisations
               | are frequently used. No one likes slow DSP audio
               | software.
        
           | rzzzt wrote:
           | Use of Ethernet in real-time systems. Packet loss, collision
           | rate, jitter is """good enough""" so it became an acceptable
           | replacement of eg. ATM.
        
             | lukeh wrote:
             | Or you use AVB/TSN which gives you stronger guarantees, but
             | requires cooperation of all bridges (switches).
        
           | RossBencina wrote:
           | > deterministically fast
           | 
           | Indeed, like all real-time systems you need to think in terms
           | of worst-case time complexity, not amortized complexity.
        
           | jancsika wrote:
           | > If your DSP callback executes in 1ms 99.99% of the time but
           | sometimes takes 10ms, you're hosed.
           | 
           | I tend to agree, but...
           | 
           | From my recollection of using Zoom-- it has this bizarre but
           | workable recovery method for network interruptions. Either
           | the server or the client keeps some amount of the last input
           | audio in a buffer. Then if the server detects connection
           | problems at time 't', it grabs the buffer from t - 1 seconds
           | all the way until the server detects better connectivity.
           | Then it starts a race condition, playing back that amount of
           | the buffer to all clients at something like 1.5 speed. From
           | what I remember, this algo typically wins the race and saves
           | the client from having to repeat themselves.
           | 
           | That's not happening inside a DSP routine. But my point is
           | that some clever engineer(s) at Zoom realized that missing
           | deadlines in audio delivery does not _necessarily_ mean
           | "hosed." I'm also going to rankly speculate that every other
           | video conferencing tool hard-coupled missing deadlines with
           | "hosed," and that's why Zoom is the only one where I've ever
           | experienced the benefit of that feature.
        
             | ssfrr wrote:
             | The context for this article is writing pro audio software,
             | where that kind of distortion would generally be as bad as
             | a dropout, if not worse.
        
           | RossBencina wrote:
           | > dogmatically following this advice might be overkill
           | 
           | It depends on your appetite for risk and the cost of failure.
           | 
           | A big part of the problem is that general purpose computing
           | systems (operating systems and hardware) are not engineered
           | as real-time systems and there are rarely vendor guarantees
           | with respect to real-time behavior. Under such circumstances,
           | my position is that you need to code defensively. For
           | example, if your operating system memory allocator does not
           | guarantee a worst-case bound on execution time, do not use it
           | in a real-time context.
        
         | RossBencina wrote:
         | In the context of the article, I assume that the driver has
         | arranged sufficient buffering so that the jitter in scheduling
         | across a bus (PCI, USB) is masked with respect to the client
         | code. But you are correct that communications overhead can cut
         | into your compute time if it is not addressed. Some audio APIs
         | (e.g. CoreAudio) allow for configuring the buffering margins,
         | so you can trade off buffer latency against available audio
         | compute %. There is a whole world of debate surrounding how to
         | best schedule audio compute (e.g. interrupt driven vs. delay-
         | locked high precision timers).
         | 
         | Assuming the context is a desktop OS (which is the context of
         | TFA), I think that the main source of non-determinism is
         | scheduling jitter (the time between the ideal start of your
         | computation, and the time when the OS gives you the CPU to
         | start the computation). Of course if you can't arrange
         | exclusive or max-priority access to a CPU core you're also
         | going to be competing with other processes. Then there is non-
         | deterministic execution time on most modern CPUs due to cache
         | timing effects, superscalar out of order instruction
         | scheduling, inter-core synchronisation, and so on. So yeah,
         | you're going to need some margin unless you're on dedicated
         | hardware with deterministic compute (e.g. a DSP chip).
        
           | swatcoder wrote:
           | No, I'm just talking about the common case where you have
           | some other stuff going on before or after your own audio
           | processing code: a software instrument your framework
           | provides, some AudioUnits or gstreamer nodes adding other
           | effects, the whole device chain in the DAW that's hosting
           | you, etc. _All_ of those things need to get done within your
           | window so you can 't use the whole thing for yourself.
           | 
           | Most people learning audio programming aren't making a
           | standalone audio app where they do all the processing, or at
           | least not an interesting one. They're usually either making
           | something like a plugin that ends up in somebody else's
           | bus/graph, or something like a game or application that
           | creates a bus/graph and shoves a bunch of different stuff
           | into it.
        
       | user_7832 wrote:
       | Slightly tangential, does anyone know any good (windows based)
       | DSP software? EquilizerAPO is decent in theory but beyond being
       | clunky to use unfortunately doesn't even seem to work 90% of the
       | time.
        
         | bratwurst3000 wrote:
         | I think camilla dsp works for windows.
        
         | chresko wrote:
         | SuperCollider
        
         | spacechild1 wrote:
         | Graphical: Pure Data, Max/MSP
         | 
         | Text based: SuperCollider, Csound, Chuck
        
         | Ylpertnodi wrote:
         | https://www.airwindows.com/consolidated/
        
         | rzzzt wrote:
         | Cockos' JSFX: https://www.cockos.com/jsfx/
        
         | RossBencina wrote:
         | AudioMulch?
        
       | spacechild1 wrote:
       | A timeless classic! This is the first thing I always recommend to
       | anyone interested in real-time audio programming.
        
       | chaosprint wrote:
       | Great resource! For those interested in learning the fundamentals
       | of audio programming, I highly recommend starting with Rust.
       | 
       | the cpal library in Rust is excellent for developing cross-
       | platform desktop applications. I'm currently maintaining this
       | library:
       | 
       | https://github.com/chaosprint/asak
       | 
       | It's a cross-platform audio recording/playback CLI tool with TUI.
       | The source code is very simple to read. PRs are welcomed and I
       | really hope Linux users can help to test and review new PRs :)
       | 
       | When developing Glicol(https://glicol.org), I documented my
       | experience of "fighting" with real-time audio in the browser in
       | this paper:
       | 
       | https://webaudioconf.com/_data/papers/pdf/2021/2021_8.pdf
       | 
       | Throughout the process, Paul Adenot's work was immensely helpful.
       | I highly recommend his blog:
       | 
       | https://blog.paul.cx/post/profiling-firefox-real-time-media-...
       | 
       | I am currently writing a wasm audio module system, and hope to
       | publish it here soon.
        
         | nyanpasu64 wrote:
         | Is it still the case that cpal doesn't support "synchronous"
         | duplex audio where the program inputs audio from a source and
         | outputs it to a sink (either with feedback or outputting
         | unrelated audio), with an integer number of periods (as little
         | as 2) of software-level latency if you copy source buffers to
         | the sink? Last time I used it, each stream is opened in input
         | or output mode and opening both does not run with any
         | guaranteed timing relation.
        
       | marcod wrote:
       | Off topic. Anybody else like Thursday Next? Had to think of "Time
       | waits for no man!"
        
       | brcmthrowaway wrote:
       | This seems super outdated. Isn't CoreAudio HW accelerated now?
        
       | jmkr wrote:
       | As a web developer, learning music and audio programming makes my
       | mind melt. We often say "real time" when we mean "fast." But in
       | audio real time means "really fast, all the time" and somewhat
       | deterministically.
       | 
       | If your tempo drifts, then you're not going to hear the rhythm
       | correctly. If you have a bit of latency on your instrument, it's
       | like turning on a delay pedal where the only signal coming
       | through is the delay.
       | 
       | One might assume if you just follow audio programming guides then
       | you can do all this, but you still need to have your system setup
       | to handle real time audio, in addition to your program.
       | 
       | It's all noticeable.
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:00 UTC)