[HN Gopher] Highres Spectrograms with the DFT Shift Theorem
___________________________________________________________________
Highres Spectrograms with the DFT Shift Theorem
Author : gbh444g
Score : 47 points
Date : 2021-05-03 20:22 UTC (2 hours ago)
(HTM) web link (soundshader.github.io)
(TXT) w3m dump (soundshader.github.io)
| andai wrote:
| Just a heads up, you have to click the images to see the full
| resolution version! I spent a good while confused about not being
| able to see the details mentioned in the images.
| gbh444g wrote:
| Hello HN! Author here. I was thinking to call the post "The
| underappreciated complexity of musical sounds" but decided to
| stick with the DFT one as it would probably get more attention.
| This is a small discovery I came across this weekend. FFT-based
| spectrograms of musical instruments isn't a novel thing do, but I
| thought what if I do a super highres spectrogram with a continuum
| of freqencies, instead of the N fixed ones FFT gives. Turns out,
| FFT "supports" such frequency shifting by multiplying the input
| by a specially constructed complex exponent. As a result, I've
| found out that musical instruments produce sophisticated
| ornaments in between the harmonic levels.
| CyberRabbi wrote:
| My mind is kind of blown that birdsong virtually does not
| include higher harmonics. I didn't even think that was possible
| for a physical resonator. Great post
| ttoinou wrote:
| maybe they were not captured by the bandiwth-limited
| microphone ?
| akomtu wrote:
| I think the mystery has a simple explanation: when a bird
| sings at 7 kHz and the mp3 file captures only first 20 kHz,
| there isn't much room for harmonics. Maybe birds do have
| interesting harmonics at 56 kHz, we just don't know.
| stainforth wrote:
| What is an ornament?
| cviilgan wrote:
| Did I understand this correctly, what you are doing is
| essentially:
|
| X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2]
|
| With F[x[k]] the DFT of the time-domain signal x[k], x'[k] =
| x[k]*exp(2*pi*i*k*alpha) and this alpha some constant which
| yields a frequency-domain shift by 25Hz.
|
| If so: How does this method compare to zero-padding the time-
| domain signal (i.e. sinc-interpolating the frequency domain)?
| It is an interesting concept, but alas it's not immediately
| clear to me how to analyze this...
| Lichtso wrote:
| On that note, also checkout wavelets to generate spectrograms:
| https://en.wikipedia.org/wiki/Wavelet
|
| I have some implementations here: https://github.com/Lichtso/CCWT
| https://github.com/Lichtso/WebSpectrogram
| crazygringo wrote:
| This looks cool! But really needs "before" and "after" comparison
| images -- lo-res vs hi-res.
|
| Seeing the hi-res images only gives me no idea what kind of
| improvement this is showing...
|
| @gbh444g Hope you could maybe add some lo-res versions :)
|
| (Would also be cool to have audio clips next to each image as
| well, but that's less important.)
| crazygringo wrote:
| > _Smoothness in the time direction is easier to achieve: the
| 1024 bins window can be advanced by arbitrarily small time
| steps._
|
| It appears you're doing just that, but the time "width" is still
| readily apparent in many of the spectrograms, most obviously on
| the birdsong ones -- almost like a horizontal motion blur.
|
| Would a deconvolution filter be able to meaningfully horizontally
| "deblur" the spectrograms? So the birdsongs didn't appear to be
| drawn with a wide-tip marker, but rather a ballpoint pen? So not
| just hi-res, but hi-focus.
| LeegleechN wrote:
| It's unfortunate that the article doesn't get into the
| fundamental limits of spectrogram resolution which are based on
| the famous uncertainty principle(https://en.wikipedia.org/wiki/Fo
| urier_transform#Uncertainty_...). For example there is a
| fundamental tradeoff between frequency resolution and time
| resolution similar to the position/momentum tradeoff in quantum
| mechanics. The Continuous Wavelet Transform which is alluded to
| in the article is a way to tune that tradeoff by frequency bin to
| best align with human sound perception.
| andai wrote:
| I've been wondering about the apparent contradiction between
| the limitations of spectrograms and the remarkable fidelity of
| MP3 files, which I thought operated along similar lines.
|
| When you convert a spectrogram back into sound it sounds like
| crap, but then how does MP3 store the frequency information
| (and why can't we use that for visualizations)?
|
| The math is beyond my understanding, can anyone give some kind
| of analogy maybe?
| achillesheels wrote:
| My hypothesis: it is stored magnetically (after all magnetic
| sinusoidals exist) and converted electrically once the mp3 is
| activated in time.
| bad_username wrote:
| I implemented a simple clone of mp3 and it was not that hard.
| If you do a discrete Fourier transform of the audio (in small
| overlapping windows), quantize the resulting coefficients,
| and compress them losslessly using the Huffman codes, you
| will end up with something not that far from mp3. The human
| ear is quite forgiving to the effects of quantization in
| frequency domain.
|
| MP3 does not have remarkable fidelity though. MP3, and my
| clone of it, suffers from time domain artifacts. Quantization
| in the frequency domain causes distortion in the time domain
| as well, negatively affecting high frequency transient sounds
| like cymbals. That is more noticeable. Newer generation
| codecs like AAC handle transients much better, but they are
| considerably more advanced, and often use different
| transforms like wavelet transform.
| gugagore wrote:
| The general concepts are described here:
| https://en.m.wikipedia.org/wiki/Psychoacoustics
|
| I'm not sure what you mean by converting the spectrogram to
| sound, but my guess is that the windowing done on the short-
| time Fourier transform is causing artifacts.
| jcelerier wrote:
| > When you convert a spectrogram back into sound it sounds
| like crap
|
| fft gives you the spectrum + the phase. if you only use the
| spectrum to resynthesise you're missing half the information.
| temporal domain <-> spectral domain is a 100% lossless
| transform in both directions.
| efnx wrote:
| I love this and have been looking for a program that's like
| Photoshop for sound.
| layoutIfNeeded wrote:
| You can try interpreting images as spectrograms, but the result
| will be a cacophonic mess.
|
| There's a reason why nobody does this. (Other than avantgarde
| experimental composers maybe, but they _are_ looking for
| cacophony)
| [deleted]
___________________________________________________________________
(page generated 2021-05-03 23:00 UTC)