[HN Gopher] TuneNN: A transformer-based network model for pitch ...
___________________________________________________________________
TuneNN: A transformer-based network model for pitch detection
Author : CMLab
Score : 80 points
Date : 2023-12-19 12:27 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| CMLab wrote:
| A transformer-based network model, pitch tracking for musical
| instruments.
|
| The timbre of musical notes is the result of various combinations
| and transformations of harmonic relationships, harmonic strengths
| and weaknesses, instrument resonant peaks, and structural
| resonant peaks over time.
|
| It utilizes the transformer-based tuneNN network model for
| abstract timbre modeling, supporting tuning for 12+ instrument
| types.
| azinman2 wrote:
| This smells like an automated summary.
| advisedwang wrote:
| It is from the submitter. I think it is intended as a
| submission statement.
| azinman2 wrote:
| That's the first time I've seen/noticed that. I think it
| makes me feel better?
| vessenes wrote:
| This is cool! The very best software-based tuning tech out there
| is probably in piano tuning apps; they cost hundreds of dollars+
| and are specifically made to report on harmonics and other piano
| nuances.
|
| Do you have any comparisons against other pitch detection tech?
| Accuracy? Delay/Responsiveness? I assume it's much more compute
| work than a handcoded FFT type pitch detector.
|
| I think it's possible this would find utilization in the piano
| world if the output offers something new / something that can
| analyze what a piano tuning maestro can hear and make it
| accessible to a mid-tier tuner.
| jansommer wrote:
| Sounds like you know a thing or two about pitch detection...
| I've been working on a C implementation of YIN and PYIN (a real
| GPL minefield for someone wanting to provide the end result as
| MIT/public domain!), and am wondering if it's a good choice for
| real time, cpu-bound speech pitch detection, or if there's
| better ways. May I ask what your thoughts are on this?
| ronsor wrote:
| Have you also considered implementing the Nebula[1]
| algorithm?
|
| [1] https://github.com/Sleepwalking/nebula
| jansommer wrote:
| I need non-GPL libraries as a reference. The problem with
| YIN and especially PYIN is that the MIT-code I've found
| sometimes looks a bit too similar to earlier code in GPL.
| Rewriting that into the same but in different code is
| fairly hard. Here I'm assuming that translating eg. GPL
| Python or C++ into C would mean the license is retained
| ska wrote:
| Can you not just write it from the paper(s)? Or is that
| more effort than value to you?
|
| > that translating eg. GPL Python or C++ into C would
| mean the license is retained
|
| It depends a bit on what exactly "translating" means but
| you could easily be a derivative work.
|
| Honestly in that situation I wouldn't even look at the
| code. You might use in to test equivalent behavior after
| you have your own implementation, but only in a gross
| sense.
| jansommer wrote:
| I think I have to look at the code when using other
| people's MIT licensed code... If they have used something
| that's GPL or used someone else's code that turns out to
| be GPL, then it becomes my problem when translating it.
| And I'm not smart enough to just follow a paper
| ska wrote:
| > And I'm not smart enough to just follow a paper
|
| Don't sell yourself short. This is the sort of thing that
| is only straightforward if you have the right background.
| sevagh wrote:
| I have some code here if it interests you:
| https://github.com/sevagh/pitch-detection
|
| My favorite is the McLeod Pitch Method/MPM. Runs fast
| enough for realtime purposes in a WASM example too:
| https://github.com/sevagh/pitchlite
| jansommer wrote:
| Ha! I've translated your YIN code actually! Your
| autocorrelation is pretty cool - GPL versions all use an
| additional FFT. Have been struggling with your PYIN
| implementation because the beta distribution is copied
| from the GPL PYIN source, and the paper just references
| its source code for that part, and as you also found out,
| it's not a real beta distribution. I asked one of the
| PYIN authors (Dixon) if he were willing to change the
| license and he forwarded my mail a week ago - haven't
| heard back. Then there's the absolute_threshold function
| that is the same as in the PYIN source where it says
| "using Jorgen Six'es loop construct". This "loop
| construct" doesn't have a license, because he doesn't
| answer the issues about that in his TarsosDSP library,
| and I'm not sure if I should bother him about a few lines
| of code. I'm assuming it's a coincidence and that's just
| a normal way to find the absolute threshold. I really
| don't want to point fingers here, I'm being paranoid
| because I try to make sure I don't publish something that
| can put people in trouble...
|
| So I have been staring at your code for many hours, and
| the YIN-implementation works well. The PYIN on the other
| hand.. well I necro posted a while ago in one of your
| pull requests I think ;)
| xavriley wrote:
| It sounds like you've found it already but th original
| pYin implementation is in the VAMP plugin. Simon Dixon is
| my PhD supervisor but he's quite busy. Feel free to email
| me questions in my the meantime. j.x.riley@ the same
| university as Simon. There's also a Python implementation
| in the librosa library which might have a better license
| for your purposes.
| ks2048 wrote:
| That's interesting. Can you point to one of these piano tuning
| apps that are $100+?
| CMLab wrote:
| Based on our current tests, our algorithm shows significantly
| higher accuracy and robustness compared to traditional digital
| signal algorithms such as PEF, NCF, YIN, HPS, etc. Our team is
| working diligently, and we will release benchmark test data and
| results in the near future.
| im3w1l wrote:
| That's pretty nice. Do you have any idea how it does it?
| rrherr wrote:
| How does the accuracy of this compare to CREPE?
|
| https://github.com/marl/crepe
|
| https://github.com/maxrmorrison/torchcrepe
|
| Does anyone know what the current state of the art is, within the
| Music Information Retrieval community?
| CMLab wrote:
| CREPE generally has high latency and error rates in instrument
| pitch recognition, especially for guitar instruments. Our team
| will release benchmark test data and results later.
| rrherr wrote:
| Thanks. I'd love to try TuneNN! Are you releasing a
| pretrained model? How do I run it on a wav file?
| xavriley wrote:
| High latency - agreed but it depends on whether a GPU is
| available or not. If it is then theoretically CREPE could be
| real-time. The error rates for pitch recognition are still
| quite good though for the full CREPE model. I'm interested to
| see the data on this claim.
| ranting-moth wrote:
| To the dev: the tuner gives me an incredibly high error window
| with the following message. It doesn't prompt to access the mic
| (I think that's related). Ubuntu/KDE/Firefox:
|
| An error occurred running the Unity content on this page. See
| your browser JavaScript console for more info. The error was:
| TypeError: 'microphone' (value of 'name' member of
| PermissionDescriptor) is not a valid value for enumeration
| PermissionName. checkPermission@https://aifasttune.com/public/web
| /microphone/microphone.js:3... _Microphone_checkPermission@https:
| //aifasttune.com/public/web/Build/web.framework.js:10:...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| invoke_iiii@https://aifasttune.com/public/web/Build/web.framework
| .js:10:...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| @https://aifasttune.com/public/web/Build/web.wasm:wasm-functi...
| unityFramework/Module._SendMessageString@https://aifasttune.com/p
| ublic/web/Build/web.framework.js:10:... ccall@https://aifasttune.
| com/public/web/Build/web.framework.js:10:... SendMessage@https://
| aifasttune.com/public/web/Build/web.framework.js:10:... SendMessa
| ge@https://aifasttune.com/public/web/Build/web.loader.js:1:3343 l
| oadURL@https://aifasttune.com/public/web/game/fastGameController.
| js... i@https://aifasttune.com/assets/index-64322640.js:1:777
| setup/<@https://aifasttune.com/assets/index-64322640.js:1:611
| CMLab wrote:
| Thank you for providing error feedback. We will work hard to
| address it. Currently, the model-related data is relatively
| large, which may be related to network speed.
| uhoh-itsmaciek wrote:
| I got the same error on Ubuntu/GNOME/Firefox. On Chrome, I
| don't get an error and I'm correctly prompted for microphone
| access, but if I grant permission, it does not seem to pick
| anything up (I've used my mic successfully with other web
| apps).
| joonatan wrote:
| Could someone fill me in why would machine learning be necessary
| for pitch detection? Isn't it something that could just be solved
| with FFT or it's a much more complicated task?
| ks2048 wrote:
| Pitch is a *subjective* property, inherently tied to the
| complex processing humans use to perceive sounds. "Simple"
| physical measures like fundamental frequency of a periodic
| signal are very closely related, but for real-world audio
| (aren't really periodic), the relationship is more complicated.
| chrisshroba wrote:
| Could you elaborate a bit more? It seems to me like the note
| being played would always correspond to the fundamental
| frequency observed. When is this not the case? Maybe as the
| note rings out, the fundamental frequency and first few
| overtones lose power, and all that's still audible are the
| higher overtones?
| gexaha wrote:
| That's actually not true, perceived pitch can be different
| from fundamental frequency, because of psychoacoustics. E.
| g. you can have "missing fundamental" -
| https://en.wikipedia.org/wiki/Missing_fundamental - or
| other effects like "sum and difference tones", which are
| quite popular in spectralism / spectral music
| xavriley wrote:
| Simple techniques like autocorrelation can still recover
| a missing fundamental. To answer the GP post, using
| neural networks for this task is overkill for simple,
| clean signals but it can be desirable if you need a)
| extremely high accuracy or b) robust results when there
| are signal degradations like background noise
| isoprophlex wrote:
| There is a nice little rabbit hole to go in to:
| psychoacoustics of church bells.
|
| https://www.hibberts.co.uk/what-note-do-we-hear-when-a-
| bell-...
|
| _Almost all musical instruments (such as pianos, organs,
| orchestral instruments and the human voice) have sounds
| that contain a range of frequencies f, 2f, 3f, 4f and so on
| where f is the lowest frequency in the sound. The pitch or
| note we asssign to the sound corresponds to the frequency
| f. Frequencies with this regular arrangement are called
| harmonic. The frequencies in the sound of bells, on the
| other hand, are not harmonic, and the pitch we assign to
| the sound of a bell is roughly an octave below the fifth
| partial up ordered by frequency. This partial is called the
| nominal, because it provides the note name of the bell.
| There often isn't a frequency in the bell's sound
| corresponding to the pitch we hear._
| bravura wrote:
| What's the license?
|
| What are your thoughts on PESTO which learns pitch-prediction
| very well with a small network, and uses a self-supervised
| objective?
|
| https://arxiv.org/abs/2309.02265
|
| https://github.com/SonyCSLParis/pesto
| filterfiber wrote:
| Does anyone know where I should look if I want to detect specific
| sounds? Like a smoke alarm, food bowl dispenser (its very
| distinct), cat meowing, 3d printer collision, that sort of thing?
| mistercheph wrote:
| You would learn how to do this in the first & second chapters
| of the fast.ai course.
| squidsoup wrote:
| It might be worth pointing out that the banjo model is for a four
| string banjo, given a five string banjo is the more common
| instrument.
___________________________________________________________________
(page generated 2023-12-19 23:00 UTC)