[HN Gopher] Show HN: Cloning a musical instrument from 16 second...
___________________________________________________________________
Show HN: Cloning a musical instrument from 16 seconds of audio
In 2020, Magenta released DDSP [1], a machine learning algorithm /
python library which made it possible to generate good sounding
instrument synthesizers from about 6-10 minutes of data. While
working with DDSP for a project, we realised how it was actually
quite hard to find 6-10 minute of clean recordings of monophonic
instruments. In this project, we have combined the DDSP
architecture with a domain adaptation technique from speech
synthesis [2]. This domain adaptation technique works by pre-
training our model on many different recordings from the Solos
dataset [3] first and then fine-tuning parts of the model to the
new recording. This allows us to produce decent sounding instrument
synthesisers from as little as 16 seconds of target audio instead
of 6-10 minutes. [1] https://arxiv.org/abs/2001.04643 [2]
https://arxiv.org/abs/1802.06006 [3]
https://arxiv.org/abs/2006.07931 We hope to publish a paper on the
topic soon.
Author : abdljasser2
Score : 82 points
Date : 2022-02-25 13:59 UTC (9 hours ago)
(HTM) web link (erlj.notion.site)
(TXT) w3m dump (erlj.notion.site)
| michae2 wrote:
| Wow, this seems like it could be bigger than Auto-Tune. I wonder
| if we'll reach a point where artists license a DDSP model of
| their instrument or voice, rather than perform directly.
| birdyrooster wrote:
| Well, I suppose a form of this technically has already
| happened. Sakiko Fujita is the voice actor who was employed to
| create the samples for the Vocaloid Hatsune Miku. The copyright
| holder currently is SEGA.
| ajross wrote:
| While it's doing a great job emulating the timbre of the notes,
| what's interesting to me is what it's _not_ doing. Play those top
| two passages to anyone who 's spent time around sax players, and
| they'll instantly tell you which one was the "fake".
|
| Real instruments have mechanical behavior this isn't going to see
| absent some kind of modeling. Different notes on a sax (to take
| this example) are actuated by different fingers and different
| valves and have different (often multiple) embouchures, and both
| interact with the transitions between pairs of notes (and with
| the dynamics with which they're played). All that complexity is
| absolutely hearable in the transitions between notes, and the ML
| layers aren't going to have the ability to pick it up absent a
| _much_ larger training set.
|
| Basically: 16 seconds of audio is enough to get you the frequency
| spectrum of the notes, which you can do with 3-4 lines of
| synthesis code in an imperative regime. It's very much the "easy
| part" of instrument synthesis.
| davio wrote:
| I played sax through college and the second passage sounds
| legit to me.
|
| Couple of interesting things: - You can hear the keys and pads
| hitting and clicking - Vibrato on the last note is very
| realistic - Can hear the air and "spit" - The timing seems
| human, especially as it slows down a little on the lower notes
| which take more air and usually involve more awkward fingering
| with the pinky fingers. Definitely sounds like a non-
| professional, probably how an average high school player would
| play the passage.
|
| It sounds like what a player would hear versus a studio
| recording. Reminds me of being in a really reflective & tiny
| practice room.
| smaddox wrote:
| Very cool. I wonder if you could turn something like this into
| the ultimate keyboard synth. That would be one hell of a product.
| mkr-hn wrote:
| While interesting, I'm skeptical an algorithmic approach will
| ever come close to a decent wavetable-based synth (using
| samples/noises from the real instrument) or scripted sampler
| instrument (like Kontakt). It might help if you can't otherwise
| find an existing synth/sample-based instrument but _can_ find it
| in use, but those are few and far between.
| nyanpasu64 wrote:
| Does this work on synthesized sounds (like analog or physical
| modeling rather than samplers)? How does it handle the complexity
| of piano timbres (notoriously difficult to synthesize)?
| [deleted]
| samirsd wrote:
| could this model be applied to mimic the characteristics of a
| guitar amp or pedal for example?
| cellover wrote:
| Why in the name of God would a site prevent me from using
| shortcut nav to go back in nav history?!
|
| I can not use Cmd + left on a mac, is this a Notion thing?
| capableweb wrote:
| I didn't understand what you meant first, clicking on my back
| arrow on the physical mouse, or the go back icon in the browser
| and Firefox would go back like expected. Then I tried Alt+Left
| arrow on the keyword, and it didn't work! But it wasn't just
| not going back, like when you have a "too many
| history.pushState()" bug that fucks it up for you. Instead it
| went to the bottom of the page. Then I noticed it was selecting
| some block in the bottom right and that it also happens when
| you just press left arrow on the keyboard without Alt.
|
| So TLDR: bug with some selection/focus thing on the Notion
| page. Poor execution, I rate their implementation of "static
| notebook HTML pages" 7/10.
| cellover wrote:
| Oh right I see, I did not have the time to debug when I
| noticed this issue, thanks for the heads up!
|
| Also sorry for parasiting the comments here, very interesting
| research! It would have been interesting to record original
| samples in the same settings; we can hear the flute having
| much less reverb than the 1st saxophone and it could be
| helpfup to have a common room size / reverb time to help
| comparison.
| ushakov wrote:
| nice!
|
| check out "Steerable discovery of neural audio effects" paper
|
| https://csteinmetz1.github.io/steerable-nafx/
|
| i'm compiling a list of research papers for our GuitarML project,
| feel free to open a pull request/issue when your papers are
| published!
|
| https://github.com/GuitarML/mldsp-papers
| skykooler wrote:
| Have to say I'm disappointed with the chosen excerpts in the
| "More examples" section - they do not show very much of the
| generated instrument's ability (the last one is just a single
| note!)
| abdljasser2 wrote:
| Thank you for the feedback! We will synthesize longer excerpts
| in the future.
|
| For the time being there is a colab where you can play with a
| pretrained model.
| squarefoot wrote:
| Any chances this could turn into a MIDI playable instrument
| with some pre-trained models plus the possibility to submit
| user generated ones?
| skybrian wrote:
| Very interesting! Is there a preprint or demo that you forgot to
| link to? Will you be releasing source or data?
|
| I'm wondering what the hardware requirements would be for real-
| time synthesis. I work on a musical instrument project as a hobby
| and would like a good accordion sound.
| abdljasser2 wrote:
| Hello! Thank you for the interest. There are links to a colab
| etc at the bottom of the blog post!
| pieterhg wrote:
| Where is the blog post?!
| mkl wrote:
| The main HN link: https://erlj.notion.site/Neural-
| Instrument-Cloning-from-very...
| radarsat1 wrote:
| nice, i worked on something similar some years ago but this seems
| more sophisticated. i had a lot of difficulty with atonal sounds
| especially because i was trying to take a kind of "neural
| wavetable" approach, but overall i found the results intriguing
| nonetheless. curious to check out more recent efforts. i recall
| the DDSP paper and assumed those ideas would apply quite nicely
| to this problem.
|
| [1] https://arxiv.org/abs/1806.09617
| scrozier wrote:
| But if one is trying to create a great sounding instrument
| synthesizer, it is quite easy to _obtain_ 6-10 minutes of clean
| recording. Why do you have to _find_ it? Not sure what the use
| case is here...?
| abdljasser2 wrote:
| Hello!
|
| Immediate use case would be sampling. Say you like a certain
| sound in a song and would like to use it as a starting point
| for your own sound patch.
|
| I also believe that transfer learning has benefits even for
| making great sounding instruments in cases where you have
| access to lots of data. That's my intuition at least.
|
| At the very least, it saves you a lot of memory/bandwith.
| Instead of having one large model per instrument you only need
| one large models with a few extra instrument specific weights.
| al2o3cr wrote:
| Say you like a certain sound in a song and would like
| to use it as a starting point for your own sound
| patch.
|
| As a bonus, you might be the Lucky Winner of a copyright suit
| that eventually establishes a whole new area of case law.
|
| Yeah, it would be ridiculous and unreasonable - but so was "I
| copyrighted these three notes in a row so pay me naow" :(
| syntheweave wrote:
| As a sample-user, it would be great to have this available in
| the toolbox.
|
| Just reusing the original recording of a sample is equivalent
| to drawing a photorealistic tracing of an image: it
| represents a ground truth, but it's not illustrated in any
| particular artistic direction. And this makes the multisample
| libraries available today akin to "dry references" - they can
| be convincing as reproductions, some of the time, but you're
| stitching them together like a collage of photos.
|
| If you throw the sample into a synthesis engine you can push
| around the parameters, crossfade it into a loop, add some
| envelopes, modulation and layers, and make it a uniquely
| stylized instrument, and this is one way to take the source
| material to a new place by forgoing some realism.
|
| Doing the synthesis through style transfer helps move it in a
| different direction: it gets outside the bounds of directly
| sequencing performance parameters and makes the performance a
| little more like an effect, helping to glue the sound. And I
| think that could be really cool if applied to arbitrary
| source material.
| dimal wrote:
| Just want to chime in and say that I would love to have this
| ability. Will you be adding more documentation on how a
| knowledgeable user could use your library to accomplish this?
| The docs are kinda sparse and I'm not sure how I could
| actually use it.
| willis936 wrote:
| This reminds me of a task in my list that has been sitting
| there for nearly a decade: instrument FIR
| from song (justice - let there be light)
|
| Here is the spectrogram of the sound I'm talking about:
|
| https://imgur.com/kmtoMkd
|
| It's pretty easy to filter out the drums since most of the
| energy is in other bands. Looking at the spectrum again I
| don't think a simple spectral replication will nail the sound
| right. It looks like there is some sort of beat phenomenon
| that isn't present at all center frequencies.
| not1ofU wrote:
| I dont think I understand what you mean, but, if I do, then
| you could look into using spleeter. It separates musical
| stems.
|
| https://news.ycombinator.com/item?id=21431071
|
| https://github.com/deezer/spleeter/wiki/2.-Getting-
| started#u...
| willis936 wrote:
| The task I gave myself was to subtract out the drum beat
| (the song graciously gives the isolated loop before the
| instrument comes in), then mix/baseband the instrument to
| whatever frequency I wanted. If all went well I would
| make a complex FIR filter that I would pass tones into.
|
| This model assumes the timbre is independent of the tone,
| but I can see now that this assumption is quite wrong and
| something more complicated (like this ML modeling) would
| be needed.
| capableweb wrote:
| Where does the training start to "fade off" in terms of "time
| spent" and "results achieved"? It seems 1 second vs 16
| seconds have a dramatic change, but what about 50 seconds vs
| 3600 seconds (1 hour)?
___________________________________________________________________
(page generated 2022-02-25 23:00 UTC)