[HN Gopher] The ear does not do a Fourier transform
       ___________________________________________________________________
        
       The ear does not do a Fourier transform
        
       Author : izhak
       Score  : 291 points
       Date   : 2025-10-30 17:01 UTC (5 hours ago)
        
 (HTM) web link (www.dissonances.blog)
 (TXT) w3m dump (www.dissonances.blog)
        
       | p0w3n3d wrote:
       | Tbh I used to think that it does. For example, when playing
       | higher notes, it's harder to hear the out-of-tune frequencies
       | than on the lower notes.
        
         | fallingfrog wrote:
         | I haven't noticed that effect, to be honest. Actually I think
         | its the really low bass frequencies that are harder to tune-
         | especially if you remove the harmonics and just leave the
         | fundamental.
         | 
         | Are you perhaps experiencing some high frequency hearing loss?
        
           | jacquesm wrote:
           | It's even more complex than that. The low notes are hard to
           | tune because the fundamentals are very close to each other
           | and you need to have super good hearing to match the beats,
           | fortunately they sound for a long time so that helps. Missing
           | fundamentals are a funny thing too, you might not be
           | 'hearing' what you think you hear at all! The high notes are
           | hard to tune because they sound very briefly (definitely on a
           | piano) and even the slightest movement of the pin will change
           | the pitch considerably.
           | 
           | In the middle range (say, A2 through A6) neither of these
           | issues apply, so it is - by far - the easiest to tune.
        
             | TheOtherHobbes wrote:
             | See also, psychoacoustics. The ear doesn't _just_ do
             | frequency decomposition. It 's not clear if it even does
             | frequency decomposition. What actually happens is lot of
             | perceptual modelling and relative amplitude masking which
             | makes it possible to do real-time source separation.
             | 
             | Which is why we can hear individual instruments in a mix.
             | 
             | And this ability to separate sources can be trained. Just
             | as pitch perception can be trained, with varying results
             | from increased acuity up to full perfect pitch.
             | 
             | A component near the bottom of all that is range-based
             | perception of consonance and dissonance, based on the
             | relationships between beat frequencies and fundamentals.
             | 
             | Instead of a vanilla Fourier transform, frequencies are
             | divided into multiple critical bands (q.v.) with different
             | properties and effects.
             | 
             | What's interesting is that the critical bands seem to be
             | dynamic, so they can be tuned to some extent depending on
             | what's being heard.
             | 
             | Most audio theory has a vanilla EE take on all of this,
             | with concepts like SNR, dynamic range, and frequency
             | resolution.
             | 
             | But the experience of audio is hugely more complex. The
             | brain-ear system is an intelligent system which actively
             | classifies, models, and predicts sounds, speech, and music
             | as they're being heard, at various perceptual levels, all
             | in real time.
        
               | jacquesm wrote:
               | Yes, indeed, to think about the ear as the thing that
               | hears is already a huge error. The ear is - at best - a
               | faulty transducer with its own unique way of turning air
               | pressure variations into nerve impulses and what the
               | brain does with those impulses is as much a part of
               | hearing as the mechanics of the ear, just like a computer
               | keyboard does not interpret your keystrokes, it just
               | turns them into electrical signals.
        
             | fallingfrog wrote:
             | Welll. On guitar you cant really use the "matching the
             | beats" or the thing where you play the 4th on the string
             | below and make them sound in unison, because if you do that
             | all the way up the neck your guitar will be tuned to Just
             | intonation instead of equal interval intonation and certain
             | chords will sound very bad. A series of perfect 4ths and a
             | perfect 3rd does not add up to an octave. Its better to
             | reference everything to the low e string and just kind of
             | know where the pitches are supposed to land.
             | 
             | That's a side note, the rest of what you wrote was very
             | informative!
        
         | philip-b wrote:
         | No, it's vice versa. If two wind instruments play unison
         | slightly out of tune from each other, it will be very
         | noticeable. If the bass is slightly out of tune or mistakenly
         | plays a different note a semitone up or down, it's easy to not
         | notice it.
        
       | bloppe wrote:
       | Man, I've been spreading disinformation for years.
        
         | rolph wrote:
         | the closest i have been, was acoustic phase discrimination by
         | owls.
         | 
         | there appears to be no software for this, its all hardware, the
         | signal format flips as it travels through the anatomy.
        
           | nakulgarg22 wrote:
           | This might be interesting for you -
           | https://nakulg.com/assets/papers/owlet_mobisys2021_nakul.pdf
           | 
           | Owls use asymmetric skull structure which helps them in
           | spatial perception of sound.
        
             | rolph wrote:
             | that was the start of it. the offset otic openings result
             | in differential arrival times of the acoustic peaks, thus
             | phase differential.
             | 
             | neurosynaptically, there is no phase, there is frequency
             | shift corresponding to presynaptic intensity, and there is
             | spatio-temporal integration of these signals. temporal
             | integration is where "phase" matters
             | 
             | its all a mix of "digital" all or nothing "gates" and
             | analog frequency shift propagation of the "gate" output.
             | 
             | its all made nebulous by the adaptive, and hysteretic
             | nature of the elements in neural "circuitry"
        
           | lukeinator42 wrote:
           | also, the common ancestor of mammals and birds did not have a
           | tympanic ear, so sound localization evolved differently in
           | the avian vs. mammalian hearing systems. A good review is
           | here: https://journals.physiology.org/doi/pdf/10.1152/physrev
           | .0002.... How the brain calculates interaural time delays is
           | actually an interesting problem as the time delays are so
           | short, that it is less time than a neuron has to fire an
           | action potential.
        
         | saltcured wrote:
         | This is one of those pedant vs cocktail chatterer distinctions.
         | It's an interesting dive and gives a nice, triggering,
         | headline.
         | 
         | But, to the vast majority who don't really know or care about
         | the math, "Fourier Transform" is, at best, a totem for the
         | entire concept space of "frequency domain", "spectral
         | decomposition", etc.
         | 
         | They are not making fine distinctions of tradeoffs among
         | different methods. I'm not sure I'd even call it disinformation
         | to tell this hand-wavy story and pique someone's interest in a
         | topic they otherwise never thought about...
        
       | rolph wrote:
       | FT is frequency domain representation.
       | 
       | neural signaling by action potential, is also a representation of
       | intensity by frequency.
       | 
       | the cochlea is where you can begin to talk about bio-FT
       | phenomenon.
       | 
       | however the format "changes" along the signal path, whenever a
       | synapse occurs.
        
       | xeonmc wrote:
       | Nit: It's an unfortunate confusion of naming conventions, but
       | Fourier Transform in the strictest sense implies an infinite
       | "sampling" period, while the finite "sample" period counterpart
       | would correspond to Fourier _Series_ even though we colloquially
       | refer to them interchangeably.
       | 
       | (I had put "sampling" in quotes as they're actually "integration
       | period" in this context of continuous time integration, though it
       | would be less immediately evocative of the concept people are
       | colloquially familiar with. If we actually further impose a
       | constraint of finite temporal resolution so that it is honest-to-
       | god "sampling" then it becomes Discrete Fourier Transform, of
       | which the Fast Fourier Transform is one implementation of.)
       | 
       | It is this strict definition that the article title is rebuking,
       | but it's not quite what the colloquial usage loosely evokes in
       | most people's minds when we usually say Fourier Transform as an
       | analysis tool.
       | 
       | So this article should have been comparing to Fourier Series
       | analysis rather than Fourier Transform in the pedantic sense,
       | albeit that'll be a bit less provocative.
       | 
       | Regardless, it doesn't at all take away from the salient points
       | of this excellent article which are really interesting reframing
       | of the concepts: what the ear does mechanistically is applying a
       | temporal "weigting function" (filter) so it's somewhere between
       | Fourier series and Fourier transform. This article hits the nail
       | on the head on presenting the sliding scale of conjugate domain
       | trade offs (think: Heisenberg)
        
         | meowkit wrote:
         | I was a bit peeved by the title, but I think its a fair use of
         | clickbait as the article has a lot of little details about
         | acoustics in humans that I was unfamiliar with (i.e. a link to
         | a primer on the the transduction implementation of cochlear
         | cilia)
         | 
         | But yeah there is a strict vs colloquial collision here.
        
         | BrenBarn wrote:
         | Yeah, it's sort of like saying the ear doesn't do "a" Fourier
         | transform, it does a bunch of Fourier transforms on samples of
         | data, with a varying tradeoff between temporal and frequency
         | resolution. But most people would still say that's doing a
         | Fourier transform.
         | 
         | As the article briefly mentions, it's a tempting hypothesis
         | that there is a relationship between the acoustic properties of
         | human speech and the physical/neural structure of the auditory
         | system. It's hard to get clear evidence on this but a lot of
         | people have a hunch that there was some coevolution involved,
         | with the ear's filter functions favoring the frequency ranges
         | used by speech sounds.
        
           | foobarian wrote:
           | This is something you quickly learn when you read the theory
           | in the textbook, get excited, and sit down to write some code
           | and figure out that you'll have to pick a finite buffer size.
           | :-)
        
           | aidenn0 wrote:
           | > ...it's a tempting hypothesis that there is a relationship
           | between the acoustic properties of human speech and the
           | physical/neural structure of the auditory system.
           | 
           | This seems trivially true in the sense that human speech is
           | intelligible by humans; there are many sounds that humans
           | cannot hear and/or distinguish, and speech does not involve
           | those.
        
       | tryauuum wrote:
       | man I need to finally learn what a Fourier transform is
        
         | TobTobXX wrote:
         | 3Blue1Brown has a really good explanation here:
         | https://www.youtube.com/watch?v=spUNpyF58BY
         | 
         | It gave me a much better intuition than my math course.
        
         | jama211 wrote:
         | Hahaha, I was working on learning these in second year uni...
         | which was also exactly when I switched from an electrical
         | engineering focussed degree to a software one!
         | 
         | Perhaps finally I should learn too...
        
         | garbageman wrote:
         | It's an absolutely brilliant bit of maths that breaks a complex
         | waveform into the individual components. Kind of like taking an
         | orchestral song and then working out each individual
         | instrument's contribution. Learning about this left me honestly
         | aghast and in shock that it's not only possible but that
         | someone (Joseph Fourier) figured it out and then shared it with
         | the world.
         | 
         | This video does a great job explaining what it is and how it
         | works to the layman. 3blue1brown -
         | https://www.youtube.com/watch?v=spUNpyF58BY
        
         | adzm wrote:
         | the very simplest way to describe it: it is what turns a
         | waveform (amplitude x time) to a spectrogram like on a stereo
         | (amplitude x frequency)
        
           | Chabsff wrote:
           | And phase. People always forget about the phase as if it was
           | purely imaginary.
        
             | JKCalhoun wrote:
             | Ha ha, as I understand it, phase is imaginary in a Fourier
             | transform. Complex numbers are used and the imaginary
             | portion does indeed represent phase.
             | 
             | I have been told that reversing the process -- creating a
             | time-based waveform -- will not resemble (visually) the
             | original due to this phase loss in the round-tripping. But
             | then our brain never paid phase any mind so it will sound
             | the same to our ears. (Yay, MP3!)
        
               | Chabsff wrote:
               | I'm glad someone picked up on my dumb joke :), I was
               | getting worried.
               | 
               | That being said, round-tripping works just fine,
               | axiomatically so, until you go out of your way to discard
               | the imaginary component.
        
               | DonHopkins wrote:
               | Even more complex and reflectively imaginative than the
               | Fourier Transform is the mighty Cepstrum!
               | 
               | https://en.wikipedia.org/wiki/Cepstrum
               | 
               | It's literally a "backwards spectrum", and the authors in
               | 1963 were having such jolly fun they reversed the words
               | too: quefrency => frequency, saphe => phase, alanysis =>
               | analysis, liftering => filtering
               | 
               | The cepstrum is the "spectrum of a log spectrum," where
               | taking the complex logarithm turns multiplicative
               | spectral features into additive ones, laying the
               | foundation of cepstral alanysis, and later, the
               | physiologically tuned Mel-frequency cepstrum used in
               | audio compression and speech recognition.
               | 
               | https://en.wikipedia.org/wiki/Mel_scale
               | 
               | >The mel scale (after the word melody)[1] is a perceptual
               | scale of pitches judged by listeners to be equal in
               | distance from one another. [...] Use of the mel scale is
               | believed to weigh the data in a way appropriate to human
               | perception.
               | 
               | As Tukey might say: once you start doing cepstral
               | alanysis, there's no turning back, except inversely.
               | 
               | Skeptics said he was just going through a backwards
               | phase, but it turned out to work! ;)
               | 
               | https://news.ycombinator.com/item?id=24386845
               | 
               | DonHopkins on Sept 5, 2020 | parent | context | favorite
               | | on: Mathematicians should stop naming things after
               | eac...
               | 
               | I love how they named the inverse spectrum the cepstrum,
               | which uses quefrency, saphe, alanysis, and liftering,
               | instead of frequency, phase, analysis and filtering. It
               | should not be confused with the earlier concept of the
               | kepstrum, of course! ;)
               | 
               | https://en.wikipedia.org/wiki/Cepstrum
               | 
               | >References to the Bogert paper, in a bibliography, are
               | often edited incorrectly. The terms "quefrency",
               | "alanysis", "cepstrum" and "saphe" were invented by the
               | authors by rearranging some letters in frequency,
               | analysis, spectrum and phase. The new invented terms are
               | defined by analogies to the older terms.
               | 
               | >Thus: The name cepstrum was derived by reversing the
               | first four letters of "spectrum". Operations on cepstra
               | are labelled quefrency analysis (aka quefrency
               | alanysis[1]), liftering, or cepstral analysis. It may be
               | pronounced in the two ways given, the second having the
               | advantage of avoiding confusion with "kepstrum", which
               | also exists (see below). [...]
               | 
               | >The kepstrum, which stands for "Kolmogorov-equation
               | power-series time response", is similar to the cepstrum
               | and has the same relation to it as expected value has to
               | statistical average, i.e. cepstrum is the empirically
               | measured quantity, while kepstrum is the theoretical
               | quantity. It was in use before the cepstrum.[12][13]
               | 
               | https://news.ycombinator.com/item?id=43341806
               | 
               | DonHopkins 7 months ago | parent | context | favorite |
               | on: What makes code hard to read: Visual patterns of c...
               | 
               | Speaking of filters and clear ergonomic abstractions, if
               | you like programming languages with keyword pairs like
               | if/fi, for/rof, while/elihw, goto/otog, you will LOVE the
               | cabkwards covabulary of cepstral quefrency alanysis,
               | invented in 1963 by B. P. Bogert, M. J. Healy, and J. W.
               | Tukey:
               | 
               | cepstrum: inverse spectrum
               | 
               | lifter: inverse filter
               | 
               | saphe: inverse phase
               | 
               | quefrency alanysis: inverse frequency analysis
               | 
               | gisnal orpcessing: inverse signal processing
               | 
               | https://en.wikipedia.org/wiki/Cepstrum
               | 
               | https://news.ycombinator.com/item?id=44062022
               | 
               | DonHopkins 5 months ago | parent | context | favorite |
               | on: The scientific "unit" we call the decibel
               | 
               | At least the Mel-frequency cepstrum is honest about being
               | a perceptual scale anchored to human hearing, rather than
               | posing as a universally-applicable physical unit.
               | 
               | https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
               | 
               | >Mel-frequency cepstral coefficients (MFCCs) are
               | coefficients that collectively make up an MFC. They are
               | derived from a type of cepstral representation of the
               | audio clip (a nonlinear "spectrum-of-a-spectrum"). The
               | difference between the cepstrum and the mel-frequency
               | cepstrum is that in the MFC, the frequency bands are
               | equally spaced on the mel scale, which approximates the
               | human auditory system's response more closely than the
               | linearly-spaced frequency bands used in the normal
               | spectrum. This frequency warping can allow for better
               | representation of sound, for example, in audio
               | compression that might potentially reduce the
               | transmission bandwidth and the storage requirements of
               | audio signals.
               | 
               | https://en.wikipedia.org/wiki/Psychoacoustics
               | 
               | >Psychoacoustics is the branch of psychophysics involving
               | the scientific study of the perception of sound by the
               | human auditory system. It is the branch of science
               | studying the psychological responses associated with
               | sound including noise, speech, and music. Psychoacoustics
               | is an interdisciplinary field including psychology,
               | acoustics, electronic engineering, physics, biology,
               | physiology, and computer science.
        
               | xeonmc wrote:
               | Actually, by the Kramers-Kronig relation you can infer
               | the imaginary part just from the real parts, if given
               | that your time signal is causal. So the phase isn't
               | actually lost in any way at all, if you assume causality.
               | 
               | Also, pedantic nit: phase would be the imaginary
               | _exponent_ of the spectrum rather than the imaginary part
               | directly, i.e, you take the logarithm of the complex
               | amplitude to get log-magnitude (real) plus phase (imag)
        
         | CGMthrowaway wrote:
         | It's a Copy>Paste Special>Transpose on a waveform, converting
         | Rows/Columns that are time/amplitude (with wavelength embedded)
         | into Rows/Columns that are frequency/amplitude (for a snapshot
         | in time).
         | 
         | People love to go on about how brilliant it is and they're
         | probably right but that's how I understand it.
        
           | TheOtherHobbes wrote:
           | Pretty much, but phase is also included. Which matters for
           | some things.
        
             | bobmcnamara wrote:
             | But mostly not for ears it turns out!
             | 
             | Phase matters for some wideband signals, but most folks
             | struggle to tell apart audio from hilbert-90-degree-
             | shifted-audio
        
               | xeonmc wrote:
               | Phase is required if it is to be a reversible transform.
               | Otherwise would just be a Functional.
        
         | anigbrowl wrote:
         | Read this (which is free): _The Scientist 's and Engineer's
         | Guide to Digital Signal Processing* https://www.dspguide.com
         | 
         | It's very comprehensive, but it's also very well written and
         | walks you through the mechanics of Fourier transforms in a way
         | that makes them intuitive._
        
         | dsego wrote:
         | humble plug https://dsego.github.io/demystifying-fourier/
        
       | edbaskerville wrote:
       | To summarize: the ear does not do a Fourier transform, but it
       | does do a time-localized frequency-domain transform akin to
       | wavelets (specifically, intermediate between wavelet and Gabor
       | transforms). It does this because the sounds processed by the ear
       | are often localized in time.
       | 
       | The article also describes a theory that human speech evolved to
       | occupy an unoccupied space in frequency vs. envelope duration
       | space. It makes no explicit connection between that fact and the
       | type of transform the ear does--but one would suspect that the
       | specific characteristics of the human cochlea might be tuned to
       | human speech while still being able to process environmental and
       | animal sounds sufficiently well.
       | 
       | A more complicated hypothesis off the top of my head: the
       | location of human speech in frequency/envelope is a tradeoff
       | between (1) occupying an unfilled niche in sound space; (2)
       | optimal information density taking brain processing speed into
       | account; and (3) evolutionary constraints on physiology of sound
       | production and hearing.
        
         | AreYouElite wrote:
         | Do you believe it might be possible that the frequency band of
         | human speech is not determined by such factors at all but more
         | of a function of height? kids have higher voices adults have
         | deeper voices. Similar to stringed instruments: viola high
         | pitched and bass low pitched.
         | 
         | I'm no expert in these matters just speculating...
        
           | fwip wrote:
           | It's not height, but vocal cord length and thickness. Longer
           | vocal cords (induced by testosterone during puberty) vibrate
           | more slowly, with a lower frequency/pitch.
        
         | matthewdgreen wrote:
         | If you take this thought process even farther, specific words
         | and phonemes should occupy specific slices of the tradeoff
         | space. Across all languages and cultures, an immediate warning
         | that a tiger is about to jump on you should sit in a different
         | place than a mother comforting a baby (which, of course, it
         | does.) Maybe that even filters down to ordinary conversational
         | speech.
        
         | xeonmc wrote:
         | Analogy: when you knock on doors, how do you decide what rhythm
         | and duration to use, so that it won't be mistaken as
         | accidentally hitting the door?
        
           | toast0 wrote:
           | Shave and a haircut is the only option in my knocking
           | decision tree.
        
             | cnity wrote:
             | Thanks for giving your two bits on the matter.
        
             | throwaway198846 wrote:
             | ... What does that mean?
        
               | crazygringo wrote:
               | https://en.wikipedia.org/wiki/Shave_and_a_Haircut
        
         | a-dub wrote:
         | > At high frequencies, frequency resolution is sacrificed for
         | temporal resolution, and vice versa at low frequencies.
         | 
         | this is the time-frequency uncertainty principle. intuitively
         | it can be understood by thinking about wavelength. the more
         | stretched out the waveform is in time, the more of it you need
         | to see in order to have a good representation of its frequency,
         | but the more of it you see, the less precise you can be about
         | where exactly it is.
         | 
         | > but it does do a time-localized frequency-domain transform
         | akin to wavelets
         | 
         | maybe easier to conceive of first as an arbitrarily defined
         | filter bank based on physiological results rather than trying
         | to jump directly to some neatly defined set of orthogonal basis
         | functions. additionally, orthogonal basis functions cannot, by
         | definition, capture things like masking effects.
         | 
         | > A more complicated hypothesis off the top of my head: the
         | location of human speech in frequency/envelope is a tradeoff
         | between (1) occupying an unfilled niche in sound space; (2)
         | optimal information density taking brain processing speed into
         | account; and (3) evolutionary constraints on physiology of
         | sound production and hearing.
         | 
         | (4) size of the animal.
         | 
         | notably: some smaller creatures have supersonic vocalization
         | and sensory capability, sometimes this is hypothesized to
         | complement visual perception for avoiding predators, it also
         | could just have a lot to do with the fact that, well, they have
         | tiny articulators and tiny vocalizations!
        
           | Terr_ wrote:
           | > it also could just have a lot to do with the fact that,
           | well, they have tiny articulators and tiny vocalizations!
           | 
           | Now I'm imagining some alien shrew with vocal-cords (or
           | syrinx, or whatever) that runs the entire length of its body,
           | just so that it can emit lower-frequency noises for some
           | reason.
        
             | bragr wrote:
             | Well without the humorous size difference, this is
             | basically what whales and elephants do for long distance
             | communication.
        
               | Terr_ wrote:
               | Was playing around with a fundamental frequency
               | calculator [0] to associate certain sizes to hertz, then
               | using a tone-generator [1] to get a subjective idea of
               | what it'd sound like.
               | 
               | Though of course, nature has plenty of other tricks, like
               | how Koalas can go down to ~27hz. [2]
               | 
               | [0] https://acousticalengineer.com/fundamental-frequency-
               | calcula...
               | 
               | [1] https://www.szynalski.com/tone-generator/
               | 
               | [2] https://www.nature.com/articles/nature.2013.14275
        
             | Y_Y wrote:
             | Sounds like an antenna, if you'll accept electromagnetic
             | noise then there are some fish that could pass for your
             | shrew, e.g. https://en.wikipedia.org/wiki/Gymnotus
        
         | SoftTalker wrote:
         | Ears evolved long before speech did. Probably in step with
         | vocalizations however.
        
           | Sharlin wrote:
           | Not sure about that; I'd guess that vibration-sensing organs
           | first evolved to sense disturbances (in water, on seafloor,
           | later on dry ground and in air) caused by movement, whether
           | of a predator, prey, or a potential mate. Intentional
           | vocalizations for signalling purposes then evolved to utilize
           | the existing modality.
        
         | FarmerPotato wrote:
         | Is that an human understanding or is it just an AI that read
         | the text and ignored the pictures?
         | 
         | Why do we need a summary in a post that adds nothing new to the
         | conversation?
        
           | pests wrote:
           | Are you saying your parent post was an AI summary? There is
           | original speculation at the end and it didn't come off that
           | way to me.
        
         | dsp_person wrote:
         | Even if it is doing a wavelet transform, I still see that as
         | made of Fourier transforms. Not sure if there's a good way to
         | describe this.
         | 
         | We can make a short-time fourier transform or a wavelet
         | transform in the same way either by:
         | 
         | - filterbank approach integrating signals in time
         | 
         | - take fourier transform of time slices, integrating in
         | frequency
         | 
         | The same machinery just with different filters.
        
         | psunavy03 wrote:
         | > A more complicated hypothesis off the top of my head: the
         | location of human speech in frequency/envelope is a tradeoff
         | between (1) occupying an unfilled niche in sound space; (2)
         | optimal information density taking brain processing speed into
         | account; and (3) evolutionary constraints on physiology of
         | sound production and hearing.
         | 
         | Well from an evolutionary perspective, this would be
         | unsurprising, considering any other forms of language would
         | have been ill-fitted for purpose and died out. This is really
         | just a flavor of the anthropic principle.
        
         | lgas wrote:
         | > It does this because the sounds processed by the ear are
         | often localized in time.
         | 
         | What would it mean for a sound to not be localized in time?
        
           | littlestymaar wrote:
           | A continuous sinusoidal sound, I guess?
        
           | hansvm wrote:
           | It would look like a Fourier transform ;)
           | 
           | Zooming in to cartoonish levels might drive the point home a
           | bit. Suppose you have sound waves
           | |---------|---------|---------|
           | 
           | What is the frequency exactly 1/3 the way between the first
           | two wave peaks? It's a nonsensical question. The frequency
           | relates to the time delta between peaks, and looking locally
           | at a sufficiently small region of time gives no information
           | about that phenomenon.
           | 
           | Let's zoom out a bit. What's the frequency over a longer
           | period of time, capturing a few peaks?
           | 
           | Well...if you know there is only one frequency then you can
           | do some math to figure it out, but as soon as you might be
           | describing a mix of frequencies you suddenly, again,
           | potentially don't have enough information.
           | 
           | That lack of information manifests in a few ways. The exact
           | math (Shannon's theorems?) suggests some things, but the
           | language involved mismatches with human perception
           | sufficiently that people get burned trying to apply it too
           | directly. E.g., a bass beat with a bit of clock skew is very
           | different from a bass beat as far as a careless decomposition
           | is concerned, but it's likely not observable by a human
           | listener.
           | 
           | Not being localized in time means* you look at longer
           | horizons, considering more and more of those interactions.
           | Instead of the beat of a 4/4 song meaning that the frequency
           | changes at discrete intervals, it means that there's a
           | larger, over-arching pattern capturing "the frequency
           | distribution" of the entire song.
           | 
           | *Truly time-nonlocalized sound is of course impossible, so
           | I'm giving some reasonable interpretation.
        
             | jancsika wrote:
             | > It's a nonsensical question.
             | 
             | Are you talking about a discrete signal or a continuous
             | signal?
        
           | xeonmc wrote:
           | Means that it is a broad spectrum signal.
           | 
           | Imagine the dissonant sound of hitting a trashcan.
           | 
           | Now imagine the sound of pressing down all 88 keys on a piano
           | simultaneously.
           | 
           | Do they sound similar in your head?
           | 
           | The localization is located at where the phase of all
           | frequency components are aligned coherently construct into a
           | pulse, while further down in time their phases are misaligned
           | and cancel each other out.
        
         | patrickthebold wrote:
         | I think I might be missing something basic, but if you actually
         | wanted to do a Fourier transform on the sound hitting your ear,
         | wouldn't you need to wait your entire lifetime to compute it?
         | It seems pretty clear that's not what is happening, since you
         | can actually hear things as they happen.
        
           | xeonmc wrote:
           | You'll also need to have existed and started listening before
           | the beginning of time, forever and ever. Amen.
        
           | cherryteastain wrote:
           | Not really, just as we can create spectrograms [1] for a real
           | time audio feed without having to wait for the end of the
           | recording by binning the signal into timewise chunks.
           | 
           | [1] https://en.wikipedia.org/wiki/Spectrogram
        
             | IshKebab wrote:
             | Those use the Short-Time Fourier Transform, which is very
             | much like what the ear does.
             | 
             | https://en.wikipedia.org/wiki/Short-time_Fourier_transform
        
           | bonoboTP wrote:
           | Yes, for the vanilla Fourier transform you have to integrate
           | from negative to positive infinity. But more practically you
           | can put put a temporally finite-support window function on
           | it, so you only analyze a part of it. Whenever you see a 2d
           | spectrogram image in audio editing software, where the audio
           | engineer can suppress a certain range of frequencies in a
           | certain time period they use something like this.
           | 
           | It's called the short-time Fourier transform (STFT).
           | 
           | https://en.wikipedia.org/wiki/Short-time_Fourier_transform
        
           | IshKebab wrote:
           | Yes exactly. This is a classic "no cats and dogs don't
           | actually rain from the sky" article.
           | 
           | Nobody who knows literally anything about signal processing
           | thought the ear was doing a Fourier transform. Is it doing
           | something _like_ a STFT? Obviously yes and this article doesn
           | 't go against that.
        
         | km3r wrote:
         | > one would suspect that the specific characteristics of the
         | human cochlea might be tuned to human speech while still being
         | able to process environmental and animal sounds sufficiently
         | well.
         | 
         | I wonder if these could be used to better master movies and
         | television audio such that the dialogue is easier to hear.
        
           | kiicia wrote:
           | You are expecting too much, we still have no technology to do
           | that, unless it's about clarity of advertisement jingles /s
        
         | crazygringo wrote:
         | Yeah, this article feels like it's very much setting up a
         | ridiculous strawman.
         | 
         | Nobody who knows anything about signal processing has ever
         | suggested that the ear performs a Fourier transform _across
         | infinite time_.
         | 
         | But the ear _does_ perform something very much akin to the FFT
         | (fast Fourier transform), turning discrete samples into
         | intensities at frequencies -- which is, of course, what any
         | reasonable person means when they say the ear does a Fourier
         | transform.
         | 
         | This article suggests it's accomplished by something between
         | wavelet and Gabor. Which, yes, is not _exactly_ a Fourier
         | transform -- but it 's producing something that is about 95-99%
         | the same in the end.
         | 
         | And again, nobody would ever suggest the ear was performing the
         | _exact_ math that the FFT does, down to the last decimal point.
         | But these filters still work essentially the same way as the
         | FFT in terms of how they respond to a given frequency, it 's
         | really just how they're windowed.
         | 
         | So if anyone just wants a simple explanation, I would say _yes_
         | the ear does a Fourier transform. A discrete one with
         | windowing.
        
           | anyfoo wrote:
           | Since we're being pedantic, there is _some_ confusion of
           | ideas here (even though you do make some valid points).
           | 
           | First, I think when you say FFT, you mean DFT. A Fourier
           | transform is both non-discrete and infinite in time. A DTFT
           | (discrete time fourier transform) is discrete, i.e. using
           | samples, but infinite. A DFT (discrete fourier transform) is
           | both finite (analyzed data has a start and an end) and
           | discrete. An FFT is effectively an implementation of a DFT,
           | and there is nothing indicating to me that hearing is in any
           | way specifically related to how the FFT computes a DFT.
           | 
           | But more importantly, I'm not sure DFT fits at all? This is
           | an analog, real-world physical process, so where is it
           | discrete, i.e. how does the ear capture _samples_?
           | 
           | I think, what's happening is more akin to a Fourier _series_
           | , which is the missing fourth category completing (FT, DTFT,
           | DFT): Continuous (non-discrete), but finite or rather
           | periodic in time.
           | 
           | Secondly, unlike Gabor transforms, wavelet transforms are
           | specifically _not_ just windowed Fourier anythings (whether
           | FT /FS/DFT/DTFT). Those are commonly called "short-time
           | Fourier transforms" (STFT, existing again in discrete and
           | non-discrete variants), and the article straight up mentions
           | that they don't fit either in its footnotes.
           | 
           | Wavelet transforms use an entirely different shape (e.g. a
           | haar wavelet) that is shifted and stretched for analysis,
           | instead of a windowed sinusoid over a windowed signal.
           | 
           | And I think those distinctions are what the article actually
           | wanted to touch upon.
        
       | kazinator wrote:
       | > A Fourier transform has no explicit temporal precision, and
       | resembles something closer to the waveforms on the right; this is
       | not what the filters in the cochlea look like.
       | 
       | Perhaps the ear does someting more vaguely analogous to a
       | discrete Fourier transforms on samples of data, which is what we
       | do in a lot of signal processing.
       | 
       | In signal processing, we take windowed samples, and do discrete
       | transforms on these. These do give us some temporal precision.
       | 
       | There is a trade off there between frequency and temporal
       | precision, analgous to the Pauli exclusion principle in quantum
       | mechanics. The better we know a frequency, the less precisely we
       | know the timing. Only an infinite, periodic signal has a single
       | precise frequency (or precise set of harmonics) which are
       | infinitely narrow blips in the frequency domain.
       | 
       | The continuous Fourier transform deals with periodic signals
       | only. We transform an entire function like sin(x) over the entire
       | domain. If that domain is interpreted as time, we are including
       | all of eternity, so to speak from negative infinite time to
       | positive.
        
         | xeonmc wrote:
         | > analgous to the Pauli exclusion principle
         | 
         | Did you mean the Heisenberg Uncertainty Principle instead? Or
         | is there actually some connection of Pauli Exlusion Principle
         | to conjugate transforms that I was't aware of?
        
           | kvakkefly wrote:
           | They are not connected afaik.
        
         | HarHarVeryFunny wrote:
         | > There is a trade off there between frequency and temporal
         | precision
         | 
         | Sure, and the FFT isn't inherently biased towards one vs the
         | other. If you take an FFT over a long time window (narrowband
         | spectrogram) then you get good frequency resolution at the cost
         | of time resolution, and vice versa for a short time window
         | (wideband spectrogram).
         | 
         | For speech recognition ideally you'd want to use both since
         | they are detecting different things. TFA is saying that this is
         | in fact what our cochlea filter bank is doing, using different
         | types of filter at different frequency ranges - better
         | frequency resolution at lower frequencies where the formants
         | are (carrying articulatory information), and better time
         | resolution at the high frequencies generated by fricatives
         | where frequency doesn't matter but accurate onset detection is
         | useful for detecting plosives.
        
         | energy123 wrote:
         | STFT?
        
       | adornKey wrote:
       | This subject has bothered me for a long time. My question to guys
       | into acoustics was always: If the cochlea performs some kind of
       | Fourier transform, what are the chances, that it uses sinus waves
       | as a base for the vector-space? - if it did anything like that it
       | could just as good use any slightly different wave-forms as a
       | base for transformation. Stiffness and non-linearity will for
       | sure take care that any ideal rubber model in physics will in
       | reality be different from the perfect sinus.
        
         | FarmerPotato wrote:
         | I find it beautiful to see the term "sinus wave."
        
         | empiricus wrote:
         | well, cochlea is working withing the realm of biological and
         | physical possibilities. basically it is a triangle through
         | which waves are propagating, and sensors along the edge. smth
         | smth this is similar to a filter bank of gabor filters that
         | respond to rising freq along the triangle edge. ergo you can
         | say fourier, but it only means sensors responding to different
         | freq becasue of their location.
        
           | adornKey wrote:
           | Yeah, but not only the frequency is important - the wave-form
           | is very relevant. For example if your wave-form is a
           | triangle, listerners will tell you that it is very noisy
           | compared to a simple sinus. If you use sinus as a base of
           | your vector space triangles really look like a noisy mix. My
           | question is, if the basic elements are really sinus, or if
           | the basic Eigen-Waves of the cochlea are other Wave-Forms
           | (e.g. slightly wider or narrower than sinus, ...). If physics
           | in the ear isn't linear, maybe sinus isn't the purest wave-
           | form for a listener.
           | 
           | Most people in Physics only know sinus and maybe sometimes
           | rectangles as a base for transformations, but mathematically
           | you could use a lot of other things - maybe very similar to
           | sinus, but different.
        
       | gowld wrote:
       | Why is there no box diagram for cochlea "between wavelet and
       | Gabor" ?
        
         | anticensor wrote:
         | Would look still too much like wavelet.
        
       | shermantanktop wrote:
       | The thesis about human speech occupying less crowded spectrum is
       | well aligned with a book called "The Great Animal Orchestra"
       | (https://www.amazon.com/Great-Animal-Orchestra-Finding-
       | Origin...).
       | 
       | That author details how the "dawn chorus" is composed of a vast
       | number of species making noise, but who are able to pick out
       | mating calls and other signals due to evolving their
       | vocalizations into unique sonic niches.
       | 
       | It's quite interesting but also a bit depressing as he documents
       | the decline in intensity of this phenomenon with habitat
       | destruction etc.
        
         | HarHarVeryFunny wrote:
         | Birds have also evolved to choose when to vocalize to best be
         | heard - doing so earlier in urban areas where later there will
         | be more traffic noise, and later in some forest environments to
         | avoid being drowned out by the early rising noisy insects.
        
         | kulahan wrote:
         | Probably worth mentioning that as evolutions that allow them to
         | compete well in nature die out, ones that allow them to compete
         | well in cities takes their place. Evolution is always a series
         | of tradeoffs.
         | 
         | Maybe we don't have sonic variation, but temporal instead.
        
           | bitwize wrote:
           | Life uh, finds a way.
        
       | brcmthrowaway wrote:
       | OT: Does anyone here believe in Intelligent Design?
        
         | xeonmc wrote:
         | As low-level physical mechanistic processes? Absolutely not.
         | 
         | As higher-order, statistically transparent abstract nudges of
         | providence existing outside the confines of causality?
         | Metaphysically interesting but philosophically futile.
        
       | superb-owl wrote:
       | The title seems a little click-baity and basically wrong. Gabor
       | transforms, wavelet transforms, etc are all generalizations of
       | the fourier transform, which give you a spectrum analysis at each
       | point in time
       | 
       | The content is generally good but I'd argue that the ear is
       | indeed doing very Fourier-y things.
        
       | debo_ wrote:
       | Fourear transform
        
       | antognini wrote:
       | If you want to get really deep into this, Richard Lyon has spent
       | decades developing the CARFAC model of human hearing: Cascade of
       | Asymmetric Resonators with Fast-Acting Compression. As far as I
       | know it's the most accurate digital model of human hearing.
       | 
       | He has a PDF of his book about human hearing on his website:
       | https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...
        
       | javier_e06 wrote:
       | This is fascinating.
       | 
       | I know of vocoders in the military hardware that encode voices to
       | resemble something more simple for compression (a low-tone male
       | voice), smaller packets that take less bandwidth. This evolution
       | of the ear to must also have evolved with our vocal chords and
       | mouth to occupy available frequencies for transmission and
       | reception for optimal communication.
       | 
       | The parallels with waveforms don't end there. Waveforms are also
       | optimized for different terrains (urban, jungle).
       | 
       | Are languages organic waveforms optimized to ethnicity and
       | terrain?
       | 
       | Cool article indeed.
        
       | rolph wrote:
       | supplemental:
       | 
       | Neuroanatomy, Auditory Pathway
       | 
       | https://www.ncbi.nlm.nih.gov/books/NBK532311/
       | 
       | Cochlear nerve and central auditory pathways
       | 
       | https://www.britannica.com/science/ear/Cochlear-nerve-and-ce...
       | 
       | Molecular Aspects of the Development and Function of Auditory
       | Neurons
       | 
       | https://pmc.ncbi.nlm.nih.gov/articles/PMC7796308/
        
       | fennec-posix wrote:
       | "It appears that human speech occupies a distinct time-frequency
       | space. Some speculate that speech evolved to fill a time-
       | frequency space that wasn't yet occupied by other existing
       | sounds."
       | 
       | I found this quite interesting, as I have noticed that I can
       | detect voices in high-noise environments. E.g. HF Radio where
       | noise is almost a constant if you don't use a digital mode.
        
       | amelius wrote:
       | What does the continuous tingling of a hair cell sound like to
       | the subject?
        
       | xmcqdpt2 wrote:
       | Many versions of this article could be written:
       | 
       | The computer does not do a Fourier transform (FFT computes the
       | discrete Fourier transform)
       | 
       | Spectroscope dont do a Fourier transform (it's actually the short
       | time FT)
       | 
       | The only thing that actually does Fourier transform is a
       | mathematician, with a pen and some paper.
        
       | tim333 wrote:
       | Nice to see a video for the tip links and ion channels.
       | 
       | I spent a while reading up on that stuff because I was trying to
       | figure what causes my tinnitus. My best guess is if the hairs
       | over bend, that stuff can break and an ion channel get stuck open
       | causing the cell to fire continually.
       | 
       | Another fun ear fact is they incorporate active amplification.
       | You can hook an electrical signal to the loudspeaker type cell to
       | make it vibrate around https://youtu.be/pij8a8aNpWQ
        
       ___________________________________________________________________
       (page generated 2025-10-30 23:00 UTC)