[HN Gopher] The ear does not do a Fourier transform
___________________________________________________________________
The ear does not do a Fourier transform
Author : izhak
Score : 291 points
Date : 2025-10-30 17:01 UTC (5 hours ago)
(HTM) web link (www.dissonances.blog)
(TXT) w3m dump (www.dissonances.blog)
| p0w3n3d wrote:
| Tbh I used to think that it does. For example, when playing
| higher notes, it's harder to hear the out-of-tune frequencies
| than on the lower notes.
| fallingfrog wrote:
| I haven't noticed that effect, to be honest. Actually I think
| its the really low bass frequencies that are harder to tune-
| especially if you remove the harmonics and just leave the
| fundamental.
|
| Are you perhaps experiencing some high frequency hearing loss?
| jacquesm wrote:
| It's even more complex than that. The low notes are hard to
| tune because the fundamentals are very close to each other
| and you need to have super good hearing to match the beats,
| fortunately they sound for a long time so that helps. Missing
| fundamentals are a funny thing too, you might not be
| 'hearing' what you think you hear at all! The high notes are
| hard to tune because they sound very briefly (definitely on a
| piano) and even the slightest movement of the pin will change
| the pitch considerably.
|
| In the middle range (say, A2 through A6) neither of these
| issues apply, so it is - by far - the easiest to tune.
| TheOtherHobbes wrote:
| See also, psychoacoustics. The ear doesn't _just_ do
| frequency decomposition. It 's not clear if it even does
| frequency decomposition. What actually happens is lot of
| perceptual modelling and relative amplitude masking which
| makes it possible to do real-time source separation.
|
| Which is why we can hear individual instruments in a mix.
|
| And this ability to separate sources can be trained. Just
| as pitch perception can be trained, with varying results
| from increased acuity up to full perfect pitch.
|
| A component near the bottom of all that is range-based
| perception of consonance and dissonance, based on the
| relationships between beat frequencies and fundamentals.
|
| Instead of a vanilla Fourier transform, frequencies are
| divided into multiple critical bands (q.v.) with different
| properties and effects.
|
| What's interesting is that the critical bands seem to be
| dynamic, so they can be tuned to some extent depending on
| what's being heard.
|
| Most audio theory has a vanilla EE take on all of this,
| with concepts like SNR, dynamic range, and frequency
| resolution.
|
| But the experience of audio is hugely more complex. The
| brain-ear system is an intelligent system which actively
| classifies, models, and predicts sounds, speech, and music
| as they're being heard, at various perceptual levels, all
| in real time.
| jacquesm wrote:
| Yes, indeed, to think about the ear as the thing that
| hears is already a huge error. The ear is - at best - a
| faulty transducer with its own unique way of turning air
| pressure variations into nerve impulses and what the
| brain does with those impulses is as much a part of
| hearing as the mechanics of the ear, just like a computer
| keyboard does not interpret your keystrokes, it just
| turns them into electrical signals.
| fallingfrog wrote:
| Welll. On guitar you cant really use the "matching the
| beats" or the thing where you play the 4th on the string
| below and make them sound in unison, because if you do that
| all the way up the neck your guitar will be tuned to Just
| intonation instead of equal interval intonation and certain
| chords will sound very bad. A series of perfect 4ths and a
| perfect 3rd does not add up to an octave. Its better to
| reference everything to the low e string and just kind of
| know where the pitches are supposed to land.
|
| That's a side note, the rest of what you wrote was very
| informative!
| philip-b wrote:
| No, it's vice versa. If two wind instruments play unison
| slightly out of tune from each other, it will be very
| noticeable. If the bass is slightly out of tune or mistakenly
| plays a different note a semitone up or down, it's easy to not
| notice it.
| bloppe wrote:
| Man, I've been spreading disinformation for years.
| rolph wrote:
| the closest i have been, was acoustic phase discrimination by
| owls.
|
| there appears to be no software for this, its all hardware, the
| signal format flips as it travels through the anatomy.
| nakulgarg22 wrote:
| This might be interesting for you -
| https://nakulg.com/assets/papers/owlet_mobisys2021_nakul.pdf
|
| Owls use asymmetric skull structure which helps them in
| spatial perception of sound.
| rolph wrote:
| that was the start of it. the offset otic openings result
| in differential arrival times of the acoustic peaks, thus
| phase differential.
|
| neurosynaptically, there is no phase, there is frequency
| shift corresponding to presynaptic intensity, and there is
| spatio-temporal integration of these signals. temporal
| integration is where "phase" matters
|
| its all a mix of "digital" all or nothing "gates" and
| analog frequency shift propagation of the "gate" output.
|
| its all made nebulous by the adaptive, and hysteretic
| nature of the elements in neural "circuitry"
| lukeinator42 wrote:
| also, the common ancestor of mammals and birds did not have a
| tympanic ear, so sound localization evolved differently in
| the avian vs. mammalian hearing systems. A good review is
| here: https://journals.physiology.org/doi/pdf/10.1152/physrev
| .0002.... How the brain calculates interaural time delays is
| actually an interesting problem as the time delays are so
| short, that it is less time than a neuron has to fire an
| action potential.
| saltcured wrote:
| This is one of those pedant vs cocktail chatterer distinctions.
| It's an interesting dive and gives a nice, triggering,
| headline.
|
| But, to the vast majority who don't really know or care about
| the math, "Fourier Transform" is, at best, a totem for the
| entire concept space of "frequency domain", "spectral
| decomposition", etc.
|
| They are not making fine distinctions of tradeoffs among
| different methods. I'm not sure I'd even call it disinformation
| to tell this hand-wavy story and pique someone's interest in a
| topic they otherwise never thought about...
| rolph wrote:
| FT is frequency domain representation.
|
| neural signaling by action potential, is also a representation of
| intensity by frequency.
|
| the cochlea is where you can begin to talk about bio-FT
| phenomenon.
|
| however the format "changes" along the signal path, whenever a
| synapse occurs.
| xeonmc wrote:
| Nit: It's an unfortunate confusion of naming conventions, but
| Fourier Transform in the strictest sense implies an infinite
| "sampling" period, while the finite "sample" period counterpart
| would correspond to Fourier _Series_ even though we colloquially
| refer to them interchangeably.
|
| (I had put "sampling" in quotes as they're actually "integration
| period" in this context of continuous time integration, though it
| would be less immediately evocative of the concept people are
| colloquially familiar with. If we actually further impose a
| constraint of finite temporal resolution so that it is honest-to-
| god "sampling" then it becomes Discrete Fourier Transform, of
| which the Fast Fourier Transform is one implementation of.)
|
| It is this strict definition that the article title is rebuking,
| but it's not quite what the colloquial usage loosely evokes in
| most people's minds when we usually say Fourier Transform as an
| analysis tool.
|
| So this article should have been comparing to Fourier Series
| analysis rather than Fourier Transform in the pedantic sense,
| albeit that'll be a bit less provocative.
|
| Regardless, it doesn't at all take away from the salient points
| of this excellent article which are really interesting reframing
| of the concepts: what the ear does mechanistically is applying a
| temporal "weigting function" (filter) so it's somewhere between
| Fourier series and Fourier transform. This article hits the nail
| on the head on presenting the sliding scale of conjugate domain
| trade offs (think: Heisenberg)
| meowkit wrote:
| I was a bit peeved by the title, but I think its a fair use of
| clickbait as the article has a lot of little details about
| acoustics in humans that I was unfamiliar with (i.e. a link to
| a primer on the the transduction implementation of cochlear
| cilia)
|
| But yeah there is a strict vs colloquial collision here.
| BrenBarn wrote:
| Yeah, it's sort of like saying the ear doesn't do "a" Fourier
| transform, it does a bunch of Fourier transforms on samples of
| data, with a varying tradeoff between temporal and frequency
| resolution. But most people would still say that's doing a
| Fourier transform.
|
| As the article briefly mentions, it's a tempting hypothesis
| that there is a relationship between the acoustic properties of
| human speech and the physical/neural structure of the auditory
| system. It's hard to get clear evidence on this but a lot of
| people have a hunch that there was some coevolution involved,
| with the ear's filter functions favoring the frequency ranges
| used by speech sounds.
| foobarian wrote:
| This is something you quickly learn when you read the theory
| in the textbook, get excited, and sit down to write some code
| and figure out that you'll have to pick a finite buffer size.
| :-)
| aidenn0 wrote:
| > ...it's a tempting hypothesis that there is a relationship
| between the acoustic properties of human speech and the
| physical/neural structure of the auditory system.
|
| This seems trivially true in the sense that human speech is
| intelligible by humans; there are many sounds that humans
| cannot hear and/or distinguish, and speech does not involve
| those.
| tryauuum wrote:
| man I need to finally learn what a Fourier transform is
| TobTobXX wrote:
| 3Blue1Brown has a really good explanation here:
| https://www.youtube.com/watch?v=spUNpyF58BY
|
| It gave me a much better intuition than my math course.
| jama211 wrote:
| Hahaha, I was working on learning these in second year uni...
| which was also exactly when I switched from an electrical
| engineering focussed degree to a software one!
|
| Perhaps finally I should learn too...
| garbageman wrote:
| It's an absolutely brilliant bit of maths that breaks a complex
| waveform into the individual components. Kind of like taking an
| orchestral song and then working out each individual
| instrument's contribution. Learning about this left me honestly
| aghast and in shock that it's not only possible but that
| someone (Joseph Fourier) figured it out and then shared it with
| the world.
|
| This video does a great job explaining what it is and how it
| works to the layman. 3blue1brown -
| https://www.youtube.com/watch?v=spUNpyF58BY
| adzm wrote:
| the very simplest way to describe it: it is what turns a
| waveform (amplitude x time) to a spectrogram like on a stereo
| (amplitude x frequency)
| Chabsff wrote:
| And phase. People always forget about the phase as if it was
| purely imaginary.
| JKCalhoun wrote:
| Ha ha, as I understand it, phase is imaginary in a Fourier
| transform. Complex numbers are used and the imaginary
| portion does indeed represent phase.
|
| I have been told that reversing the process -- creating a
| time-based waveform -- will not resemble (visually) the
| original due to this phase loss in the round-tripping. But
| then our brain never paid phase any mind so it will sound
| the same to our ears. (Yay, MP3!)
| Chabsff wrote:
| I'm glad someone picked up on my dumb joke :), I was
| getting worried.
|
| That being said, round-tripping works just fine,
| axiomatically so, until you go out of your way to discard
| the imaginary component.
| DonHopkins wrote:
| Even more complex and reflectively imaginative than the
| Fourier Transform is the mighty Cepstrum!
|
| https://en.wikipedia.org/wiki/Cepstrum
|
| It's literally a "backwards spectrum", and the authors in
| 1963 were having such jolly fun they reversed the words
| too: quefrency => frequency, saphe => phase, alanysis =>
| analysis, liftering => filtering
|
| The cepstrum is the "spectrum of a log spectrum," where
| taking the complex logarithm turns multiplicative
| spectral features into additive ones, laying the
| foundation of cepstral alanysis, and later, the
| physiologically tuned Mel-frequency cepstrum used in
| audio compression and speech recognition.
|
| https://en.wikipedia.org/wiki/Mel_scale
|
| >The mel scale (after the word melody)[1] is a perceptual
| scale of pitches judged by listeners to be equal in
| distance from one another. [...] Use of the mel scale is
| believed to weigh the data in a way appropriate to human
| perception.
|
| As Tukey might say: once you start doing cepstral
| alanysis, there's no turning back, except inversely.
|
| Skeptics said he was just going through a backwards
| phase, but it turned out to work! ;)
|
| https://news.ycombinator.com/item?id=24386845
|
| DonHopkins on Sept 5, 2020 | parent | context | favorite
| | on: Mathematicians should stop naming things after
| eac...
|
| I love how they named the inverse spectrum the cepstrum,
| which uses quefrency, saphe, alanysis, and liftering,
| instead of frequency, phase, analysis and filtering. It
| should not be confused with the earlier concept of the
| kepstrum, of course! ;)
|
| https://en.wikipedia.org/wiki/Cepstrum
|
| >References to the Bogert paper, in a bibliography, are
| often edited incorrectly. The terms "quefrency",
| "alanysis", "cepstrum" and "saphe" were invented by the
| authors by rearranging some letters in frequency,
| analysis, spectrum and phase. The new invented terms are
| defined by analogies to the older terms.
|
| >Thus: The name cepstrum was derived by reversing the
| first four letters of "spectrum". Operations on cepstra
| are labelled quefrency analysis (aka quefrency
| alanysis[1]), liftering, or cepstral analysis. It may be
| pronounced in the two ways given, the second having the
| advantage of avoiding confusion with "kepstrum", which
| also exists (see below). [...]
|
| >The kepstrum, which stands for "Kolmogorov-equation
| power-series time response", is similar to the cepstrum
| and has the same relation to it as expected value has to
| statistical average, i.e. cepstrum is the empirically
| measured quantity, while kepstrum is the theoretical
| quantity. It was in use before the cepstrum.[12][13]
|
| https://news.ycombinator.com/item?id=43341806
|
| DonHopkins 7 months ago | parent | context | favorite |
| on: What makes code hard to read: Visual patterns of c...
|
| Speaking of filters and clear ergonomic abstractions, if
| you like programming languages with keyword pairs like
| if/fi, for/rof, while/elihw, goto/otog, you will LOVE the
| cabkwards covabulary of cepstral quefrency alanysis,
| invented in 1963 by B. P. Bogert, M. J. Healy, and J. W.
| Tukey:
|
| cepstrum: inverse spectrum
|
| lifter: inverse filter
|
| saphe: inverse phase
|
| quefrency alanysis: inverse frequency analysis
|
| gisnal orpcessing: inverse signal processing
|
| https://en.wikipedia.org/wiki/Cepstrum
|
| https://news.ycombinator.com/item?id=44062022
|
| DonHopkins 5 months ago | parent | context | favorite |
| on: The scientific "unit" we call the decibel
|
| At least the Mel-frequency cepstrum is honest about being
| a perceptual scale anchored to human hearing, rather than
| posing as a universally-applicable physical unit.
|
| https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
|
| >Mel-frequency cepstral coefficients (MFCCs) are
| coefficients that collectively make up an MFC. They are
| derived from a type of cepstral representation of the
| audio clip (a nonlinear "spectrum-of-a-spectrum"). The
| difference between the cepstrum and the mel-frequency
| cepstrum is that in the MFC, the frequency bands are
| equally spaced on the mel scale, which approximates the
| human auditory system's response more closely than the
| linearly-spaced frequency bands used in the normal
| spectrum. This frequency warping can allow for better
| representation of sound, for example, in audio
| compression that might potentially reduce the
| transmission bandwidth and the storage requirements of
| audio signals.
|
| https://en.wikipedia.org/wiki/Psychoacoustics
|
| >Psychoacoustics is the branch of psychophysics involving
| the scientific study of the perception of sound by the
| human auditory system. It is the branch of science
| studying the psychological responses associated with
| sound including noise, speech, and music. Psychoacoustics
| is an interdisciplinary field including psychology,
| acoustics, electronic engineering, physics, biology,
| physiology, and computer science.
| xeonmc wrote:
| Actually, by the Kramers-Kronig relation you can infer
| the imaginary part just from the real parts, if given
| that your time signal is causal. So the phase isn't
| actually lost in any way at all, if you assume causality.
|
| Also, pedantic nit: phase would be the imaginary
| _exponent_ of the spectrum rather than the imaginary part
| directly, i.e, you take the logarithm of the complex
| amplitude to get log-magnitude (real) plus phase (imag)
| CGMthrowaway wrote:
| It's a Copy>Paste Special>Transpose on a waveform, converting
| Rows/Columns that are time/amplitude (with wavelength embedded)
| into Rows/Columns that are frequency/amplitude (for a snapshot
| in time).
|
| People love to go on about how brilliant it is and they're
| probably right but that's how I understand it.
| TheOtherHobbes wrote:
| Pretty much, but phase is also included. Which matters for
| some things.
| bobmcnamara wrote:
| But mostly not for ears it turns out!
|
| Phase matters for some wideband signals, but most folks
| struggle to tell apart audio from hilbert-90-degree-
| shifted-audio
| xeonmc wrote:
| Phase is required if it is to be a reversible transform.
| Otherwise would just be a Functional.
| anigbrowl wrote:
| Read this (which is free): _The Scientist 's and Engineer's
| Guide to Digital Signal Processing* https://www.dspguide.com
|
| It's very comprehensive, but it's also very well written and
| walks you through the mechanics of Fourier transforms in a way
| that makes them intuitive._
| dsego wrote:
| humble plug https://dsego.github.io/demystifying-fourier/
| edbaskerville wrote:
| To summarize: the ear does not do a Fourier transform, but it
| does do a time-localized frequency-domain transform akin to
| wavelets (specifically, intermediate between wavelet and Gabor
| transforms). It does this because the sounds processed by the ear
| are often localized in time.
|
| The article also describes a theory that human speech evolved to
| occupy an unoccupied space in frequency vs. envelope duration
| space. It makes no explicit connection between that fact and the
| type of transform the ear does--but one would suspect that the
| specific characteristics of the human cochlea might be tuned to
| human speech while still being able to process environmental and
| animal sounds sufficiently well.
|
| A more complicated hypothesis off the top of my head: the
| location of human speech in frequency/envelope is a tradeoff
| between (1) occupying an unfilled niche in sound space; (2)
| optimal information density taking brain processing speed into
| account; and (3) evolutionary constraints on physiology of sound
| production and hearing.
| AreYouElite wrote:
| Do you believe it might be possible that the frequency band of
| human speech is not determined by such factors at all but more
| of a function of height? kids have higher voices adults have
| deeper voices. Similar to stringed instruments: viola high
| pitched and bass low pitched.
|
| I'm no expert in these matters just speculating...
| fwip wrote:
| It's not height, but vocal cord length and thickness. Longer
| vocal cords (induced by testosterone during puberty) vibrate
| more slowly, with a lower frequency/pitch.
| matthewdgreen wrote:
| If you take this thought process even farther, specific words
| and phonemes should occupy specific slices of the tradeoff
| space. Across all languages and cultures, an immediate warning
| that a tiger is about to jump on you should sit in a different
| place than a mother comforting a baby (which, of course, it
| does.) Maybe that even filters down to ordinary conversational
| speech.
| xeonmc wrote:
| Analogy: when you knock on doors, how do you decide what rhythm
| and duration to use, so that it won't be mistaken as
| accidentally hitting the door?
| toast0 wrote:
| Shave and a haircut is the only option in my knocking
| decision tree.
| cnity wrote:
| Thanks for giving your two bits on the matter.
| throwaway198846 wrote:
| ... What does that mean?
| crazygringo wrote:
| https://en.wikipedia.org/wiki/Shave_and_a_Haircut
| a-dub wrote:
| > At high frequencies, frequency resolution is sacrificed for
| temporal resolution, and vice versa at low frequencies.
|
| this is the time-frequency uncertainty principle. intuitively
| it can be understood by thinking about wavelength. the more
| stretched out the waveform is in time, the more of it you need
| to see in order to have a good representation of its frequency,
| but the more of it you see, the less precise you can be about
| where exactly it is.
|
| > but it does do a time-localized frequency-domain transform
| akin to wavelets
|
| maybe easier to conceive of first as an arbitrarily defined
| filter bank based on physiological results rather than trying
| to jump directly to some neatly defined set of orthogonal basis
| functions. additionally, orthogonal basis functions cannot, by
| definition, capture things like masking effects.
|
| > A more complicated hypothesis off the top of my head: the
| location of human speech in frequency/envelope is a tradeoff
| between (1) occupying an unfilled niche in sound space; (2)
| optimal information density taking brain processing speed into
| account; and (3) evolutionary constraints on physiology of
| sound production and hearing.
|
| (4) size of the animal.
|
| notably: some smaller creatures have supersonic vocalization
| and sensory capability, sometimes this is hypothesized to
| complement visual perception for avoiding predators, it also
| could just have a lot to do with the fact that, well, they have
| tiny articulators and tiny vocalizations!
| Terr_ wrote:
| > it also could just have a lot to do with the fact that,
| well, they have tiny articulators and tiny vocalizations!
|
| Now I'm imagining some alien shrew with vocal-cords (or
| syrinx, or whatever) that runs the entire length of its body,
| just so that it can emit lower-frequency noises for some
| reason.
| bragr wrote:
| Well without the humorous size difference, this is
| basically what whales and elephants do for long distance
| communication.
| Terr_ wrote:
| Was playing around with a fundamental frequency
| calculator [0] to associate certain sizes to hertz, then
| using a tone-generator [1] to get a subjective idea of
| what it'd sound like.
|
| Though of course, nature has plenty of other tricks, like
| how Koalas can go down to ~27hz. [2]
|
| [0] https://acousticalengineer.com/fundamental-frequency-
| calcula...
|
| [1] https://www.szynalski.com/tone-generator/
|
| [2] https://www.nature.com/articles/nature.2013.14275
| Y_Y wrote:
| Sounds like an antenna, if you'll accept electromagnetic
| noise then there are some fish that could pass for your
| shrew, e.g. https://en.wikipedia.org/wiki/Gymnotus
| SoftTalker wrote:
| Ears evolved long before speech did. Probably in step with
| vocalizations however.
| Sharlin wrote:
| Not sure about that; I'd guess that vibration-sensing organs
| first evolved to sense disturbances (in water, on seafloor,
| later on dry ground and in air) caused by movement, whether
| of a predator, prey, or a potential mate. Intentional
| vocalizations for signalling purposes then evolved to utilize
| the existing modality.
| FarmerPotato wrote:
| Is that an human understanding or is it just an AI that read
| the text and ignored the pictures?
|
| Why do we need a summary in a post that adds nothing new to the
| conversation?
| pests wrote:
| Are you saying your parent post was an AI summary? There is
| original speculation at the end and it didn't come off that
| way to me.
| dsp_person wrote:
| Even if it is doing a wavelet transform, I still see that as
| made of Fourier transforms. Not sure if there's a good way to
| describe this.
|
| We can make a short-time fourier transform or a wavelet
| transform in the same way either by:
|
| - filterbank approach integrating signals in time
|
| - take fourier transform of time slices, integrating in
| frequency
|
| The same machinery just with different filters.
| psunavy03 wrote:
| > A more complicated hypothesis off the top of my head: the
| location of human speech in frequency/envelope is a tradeoff
| between (1) occupying an unfilled niche in sound space; (2)
| optimal information density taking brain processing speed into
| account; and (3) evolutionary constraints on physiology of
| sound production and hearing.
|
| Well from an evolutionary perspective, this would be
| unsurprising, considering any other forms of language would
| have been ill-fitted for purpose and died out. This is really
| just a flavor of the anthropic principle.
| lgas wrote:
| > It does this because the sounds processed by the ear are
| often localized in time.
|
| What would it mean for a sound to not be localized in time?
| littlestymaar wrote:
| A continuous sinusoidal sound, I guess?
| hansvm wrote:
| It would look like a Fourier transform ;)
|
| Zooming in to cartoonish levels might drive the point home a
| bit. Suppose you have sound waves
| |---------|---------|---------|
|
| What is the frequency exactly 1/3 the way between the first
| two wave peaks? It's a nonsensical question. The frequency
| relates to the time delta between peaks, and looking locally
| at a sufficiently small region of time gives no information
| about that phenomenon.
|
| Let's zoom out a bit. What's the frequency over a longer
| period of time, capturing a few peaks?
|
| Well...if you know there is only one frequency then you can
| do some math to figure it out, but as soon as you might be
| describing a mix of frequencies you suddenly, again,
| potentially don't have enough information.
|
| That lack of information manifests in a few ways. The exact
| math (Shannon's theorems?) suggests some things, but the
| language involved mismatches with human perception
| sufficiently that people get burned trying to apply it too
| directly. E.g., a bass beat with a bit of clock skew is very
| different from a bass beat as far as a careless decomposition
| is concerned, but it's likely not observable by a human
| listener.
|
| Not being localized in time means* you look at longer
| horizons, considering more and more of those interactions.
| Instead of the beat of a 4/4 song meaning that the frequency
| changes at discrete intervals, it means that there's a
| larger, over-arching pattern capturing "the frequency
| distribution" of the entire song.
|
| *Truly time-nonlocalized sound is of course impossible, so
| I'm giving some reasonable interpretation.
| jancsika wrote:
| > It's a nonsensical question.
|
| Are you talking about a discrete signal or a continuous
| signal?
| xeonmc wrote:
| Means that it is a broad spectrum signal.
|
| Imagine the dissonant sound of hitting a trashcan.
|
| Now imagine the sound of pressing down all 88 keys on a piano
| simultaneously.
|
| Do they sound similar in your head?
|
| The localization is located at where the phase of all
| frequency components are aligned coherently construct into a
| pulse, while further down in time their phases are misaligned
| and cancel each other out.
| patrickthebold wrote:
| I think I might be missing something basic, but if you actually
| wanted to do a Fourier transform on the sound hitting your ear,
| wouldn't you need to wait your entire lifetime to compute it?
| It seems pretty clear that's not what is happening, since you
| can actually hear things as they happen.
| xeonmc wrote:
| You'll also need to have existed and started listening before
| the beginning of time, forever and ever. Amen.
| cherryteastain wrote:
| Not really, just as we can create spectrograms [1] for a real
| time audio feed without having to wait for the end of the
| recording by binning the signal into timewise chunks.
|
| [1] https://en.wikipedia.org/wiki/Spectrogram
| IshKebab wrote:
| Those use the Short-Time Fourier Transform, which is very
| much like what the ear does.
|
| https://en.wikipedia.org/wiki/Short-time_Fourier_transform
| bonoboTP wrote:
| Yes, for the vanilla Fourier transform you have to integrate
| from negative to positive infinity. But more practically you
| can put put a temporally finite-support window function on
| it, so you only analyze a part of it. Whenever you see a 2d
| spectrogram image in audio editing software, where the audio
| engineer can suppress a certain range of frequencies in a
| certain time period they use something like this.
|
| It's called the short-time Fourier transform (STFT).
|
| https://en.wikipedia.org/wiki/Short-time_Fourier_transform
| IshKebab wrote:
| Yes exactly. This is a classic "no cats and dogs don't
| actually rain from the sky" article.
|
| Nobody who knows literally anything about signal processing
| thought the ear was doing a Fourier transform. Is it doing
| something _like_ a STFT? Obviously yes and this article doesn
| 't go against that.
| km3r wrote:
| > one would suspect that the specific characteristics of the
| human cochlea might be tuned to human speech while still being
| able to process environmental and animal sounds sufficiently
| well.
|
| I wonder if these could be used to better master movies and
| television audio such that the dialogue is easier to hear.
| kiicia wrote:
| You are expecting too much, we still have no technology to do
| that, unless it's about clarity of advertisement jingles /s
| crazygringo wrote:
| Yeah, this article feels like it's very much setting up a
| ridiculous strawman.
|
| Nobody who knows anything about signal processing has ever
| suggested that the ear performs a Fourier transform _across
| infinite time_.
|
| But the ear _does_ perform something very much akin to the FFT
| (fast Fourier transform), turning discrete samples into
| intensities at frequencies -- which is, of course, what any
| reasonable person means when they say the ear does a Fourier
| transform.
|
| This article suggests it's accomplished by something between
| wavelet and Gabor. Which, yes, is not _exactly_ a Fourier
| transform -- but it 's producing something that is about 95-99%
| the same in the end.
|
| And again, nobody would ever suggest the ear was performing the
| _exact_ math that the FFT does, down to the last decimal point.
| But these filters still work essentially the same way as the
| FFT in terms of how they respond to a given frequency, it 's
| really just how they're windowed.
|
| So if anyone just wants a simple explanation, I would say _yes_
| the ear does a Fourier transform. A discrete one with
| windowing.
| anyfoo wrote:
| Since we're being pedantic, there is _some_ confusion of
| ideas here (even though you do make some valid points).
|
| First, I think when you say FFT, you mean DFT. A Fourier
| transform is both non-discrete and infinite in time. A DTFT
| (discrete time fourier transform) is discrete, i.e. using
| samples, but infinite. A DFT (discrete fourier transform) is
| both finite (analyzed data has a start and an end) and
| discrete. An FFT is effectively an implementation of a DFT,
| and there is nothing indicating to me that hearing is in any
| way specifically related to how the FFT computes a DFT.
|
| But more importantly, I'm not sure DFT fits at all? This is
| an analog, real-world physical process, so where is it
| discrete, i.e. how does the ear capture _samples_?
|
| I think, what's happening is more akin to a Fourier _series_
| , which is the missing fourth category completing (FT, DTFT,
| DFT): Continuous (non-discrete), but finite or rather
| periodic in time.
|
| Secondly, unlike Gabor transforms, wavelet transforms are
| specifically _not_ just windowed Fourier anythings (whether
| FT /FS/DFT/DTFT). Those are commonly called "short-time
| Fourier transforms" (STFT, existing again in discrete and
| non-discrete variants), and the article straight up mentions
| that they don't fit either in its footnotes.
|
| Wavelet transforms use an entirely different shape (e.g. a
| haar wavelet) that is shifted and stretched for analysis,
| instead of a windowed sinusoid over a windowed signal.
|
| And I think those distinctions are what the article actually
| wanted to touch upon.
| kazinator wrote:
| > A Fourier transform has no explicit temporal precision, and
| resembles something closer to the waveforms on the right; this is
| not what the filters in the cochlea look like.
|
| Perhaps the ear does someting more vaguely analogous to a
| discrete Fourier transforms on samples of data, which is what we
| do in a lot of signal processing.
|
| In signal processing, we take windowed samples, and do discrete
| transforms on these. These do give us some temporal precision.
|
| There is a trade off there between frequency and temporal
| precision, analgous to the Pauli exclusion principle in quantum
| mechanics. The better we know a frequency, the less precisely we
| know the timing. Only an infinite, periodic signal has a single
| precise frequency (or precise set of harmonics) which are
| infinitely narrow blips in the frequency domain.
|
| The continuous Fourier transform deals with periodic signals
| only. We transform an entire function like sin(x) over the entire
| domain. If that domain is interpreted as time, we are including
| all of eternity, so to speak from negative infinite time to
| positive.
| xeonmc wrote:
| > analgous to the Pauli exclusion principle
|
| Did you mean the Heisenberg Uncertainty Principle instead? Or
| is there actually some connection of Pauli Exlusion Principle
| to conjugate transforms that I was't aware of?
| kvakkefly wrote:
| They are not connected afaik.
| HarHarVeryFunny wrote:
| > There is a trade off there between frequency and temporal
| precision
|
| Sure, and the FFT isn't inherently biased towards one vs the
| other. If you take an FFT over a long time window (narrowband
| spectrogram) then you get good frequency resolution at the cost
| of time resolution, and vice versa for a short time window
| (wideband spectrogram).
|
| For speech recognition ideally you'd want to use both since
| they are detecting different things. TFA is saying that this is
| in fact what our cochlea filter bank is doing, using different
| types of filter at different frequency ranges - better
| frequency resolution at lower frequencies where the formants
| are (carrying articulatory information), and better time
| resolution at the high frequencies generated by fricatives
| where frequency doesn't matter but accurate onset detection is
| useful for detecting plosives.
| energy123 wrote:
| STFT?
| adornKey wrote:
| This subject has bothered me for a long time. My question to guys
| into acoustics was always: If the cochlea performs some kind of
| Fourier transform, what are the chances, that it uses sinus waves
| as a base for the vector-space? - if it did anything like that it
| could just as good use any slightly different wave-forms as a
| base for transformation. Stiffness and non-linearity will for
| sure take care that any ideal rubber model in physics will in
| reality be different from the perfect sinus.
| FarmerPotato wrote:
| I find it beautiful to see the term "sinus wave."
| empiricus wrote:
| well, cochlea is working withing the realm of biological and
| physical possibilities. basically it is a triangle through
| which waves are propagating, and sensors along the edge. smth
| smth this is similar to a filter bank of gabor filters that
| respond to rising freq along the triangle edge. ergo you can
| say fourier, but it only means sensors responding to different
| freq becasue of their location.
| adornKey wrote:
| Yeah, but not only the frequency is important - the wave-form
| is very relevant. For example if your wave-form is a
| triangle, listerners will tell you that it is very noisy
| compared to a simple sinus. If you use sinus as a base of
| your vector space triangles really look like a noisy mix. My
| question is, if the basic elements are really sinus, or if
| the basic Eigen-Waves of the cochlea are other Wave-Forms
| (e.g. slightly wider or narrower than sinus, ...). If physics
| in the ear isn't linear, maybe sinus isn't the purest wave-
| form for a listener.
|
| Most people in Physics only know sinus and maybe sometimes
| rectangles as a base for transformations, but mathematically
| you could use a lot of other things - maybe very similar to
| sinus, but different.
| gowld wrote:
| Why is there no box diagram for cochlea "between wavelet and
| Gabor" ?
| anticensor wrote:
| Would look still too much like wavelet.
| shermantanktop wrote:
| The thesis about human speech occupying less crowded spectrum is
| well aligned with a book called "The Great Animal Orchestra"
| (https://www.amazon.com/Great-Animal-Orchestra-Finding-
| Origin...).
|
| That author details how the "dawn chorus" is composed of a vast
| number of species making noise, but who are able to pick out
| mating calls and other signals due to evolving their
| vocalizations into unique sonic niches.
|
| It's quite interesting but also a bit depressing as he documents
| the decline in intensity of this phenomenon with habitat
| destruction etc.
| HarHarVeryFunny wrote:
| Birds have also evolved to choose when to vocalize to best be
| heard - doing so earlier in urban areas where later there will
| be more traffic noise, and later in some forest environments to
| avoid being drowned out by the early rising noisy insects.
| kulahan wrote:
| Probably worth mentioning that as evolutions that allow them to
| compete well in nature die out, ones that allow them to compete
| well in cities takes their place. Evolution is always a series
| of tradeoffs.
|
| Maybe we don't have sonic variation, but temporal instead.
| bitwize wrote:
| Life uh, finds a way.
| brcmthrowaway wrote:
| OT: Does anyone here believe in Intelligent Design?
| xeonmc wrote:
| As low-level physical mechanistic processes? Absolutely not.
|
| As higher-order, statistically transparent abstract nudges of
| providence existing outside the confines of causality?
| Metaphysically interesting but philosophically futile.
| superb-owl wrote:
| The title seems a little click-baity and basically wrong. Gabor
| transforms, wavelet transforms, etc are all generalizations of
| the fourier transform, which give you a spectrum analysis at each
| point in time
|
| The content is generally good but I'd argue that the ear is
| indeed doing very Fourier-y things.
| debo_ wrote:
| Fourear transform
| antognini wrote:
| If you want to get really deep into this, Richard Lyon has spent
| decades developing the CARFAC model of human hearing: Cascade of
| Asymmetric Resonators with Fast-Acting Compression. As far as I
| know it's the most accurate digital model of human hearing.
|
| He has a PDF of his book about human hearing on his website:
| https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...
| javier_e06 wrote:
| This is fascinating.
|
| I know of vocoders in the military hardware that encode voices to
| resemble something more simple for compression (a low-tone male
| voice), smaller packets that take less bandwidth. This evolution
| of the ear to must also have evolved with our vocal chords and
| mouth to occupy available frequencies for transmission and
| reception for optimal communication.
|
| The parallels with waveforms don't end there. Waveforms are also
| optimized for different terrains (urban, jungle).
|
| Are languages organic waveforms optimized to ethnicity and
| terrain?
|
| Cool article indeed.
| rolph wrote:
| supplemental:
|
| Neuroanatomy, Auditory Pathway
|
| https://www.ncbi.nlm.nih.gov/books/NBK532311/
|
| Cochlear nerve and central auditory pathways
|
| https://www.britannica.com/science/ear/Cochlear-nerve-and-ce...
|
| Molecular Aspects of the Development and Function of Auditory
| Neurons
|
| https://pmc.ncbi.nlm.nih.gov/articles/PMC7796308/
| fennec-posix wrote:
| "It appears that human speech occupies a distinct time-frequency
| space. Some speculate that speech evolved to fill a time-
| frequency space that wasn't yet occupied by other existing
| sounds."
|
| I found this quite interesting, as I have noticed that I can
| detect voices in high-noise environments. E.g. HF Radio where
| noise is almost a constant if you don't use a digital mode.
| amelius wrote:
| What does the continuous tingling of a hair cell sound like to
| the subject?
| xmcqdpt2 wrote:
| Many versions of this article could be written:
|
| The computer does not do a Fourier transform (FFT computes the
| discrete Fourier transform)
|
| Spectroscope dont do a Fourier transform (it's actually the short
| time FT)
|
| The only thing that actually does Fourier transform is a
| mathematician, with a pen and some paper.
| tim333 wrote:
| Nice to see a video for the tip links and ion channels.
|
| I spent a while reading up on that stuff because I was trying to
| figure what causes my tinnitus. My best guess is if the hairs
| over bend, that stuff can break and an ion channel get stuck open
| causing the cell to fire continually.
|
| Another fun ear fact is they incorporate active amplification.
| You can hook an electrical signal to the loudspeaker type cell to
| make it vibrate around https://youtu.be/pij8a8aNpWQ
___________________________________________________________________
(page generated 2025-10-30 23:00 UTC)