[HN Gopher] How AI hears accents: An audible visualization of ac...
___________________________________________________________________
How AI hears accents: An audible visualization of accent clusters
Author : ilyausorov
Score : 120 points
Date : 2025-10-14 16:07 UTC (6 hours ago)
(HTM) web link (accent-explorer.boldvoice.com)
(TXT) w3m dump (accent-explorer.boldvoice.com)
| dereknelson wrote:
| really fun discovery clicking a dot and hearing the accent. neat
| visualization, lots to think about!
| tmshapland wrote:
| Fascinating! How did you decouple the speaker-specific vocal
| characteristics (timbre, pitch range) from the accent-defining
| phonetic and prosodic features in the latent space?
| oscarfree wrote:
| We didn't explicitly. Because we finetuned this model for
| accent classification, the later transformer layers appear to
| ignore non-accent vocal characteristics. I verified this for
| gender for example.
| JakeLester wrote:
| Thank you for sharing! the 3d visual was an interesting
| application of the UMAP technique.
|
| Is there a way to subscribe to these blog posts for auto-
| notification?
| nosrepa wrote:
| Yeah, if only there was a protocol for that.
| bheadmaster wrote:
| It would have taken you a second more to type out "RSS", and
| turn a sarcastic comment into an informative one.
|
| Obligatory xkcd: https://xkcd.com/1053/
| ahstilde wrote:
| why is spanish so distributed?
| ilyausorov wrote:
| Good question! It's likely because there are lots of different
| accents of Spanish that are distinct from each other. Our
| labels only capture the native language of the speaker right
| now, so they're all grouped together but it's definitely on our
| to-do list to go deeper into the sub accents of each language
| family!
| bikeshaving wrote:
| Spanish is one of those languages I would love to see as a
| breakdown by country. I'm sure Chilean Spanish looks very
| different from Catalonian Spanish.
| rkomorn wrote:
| Did you mean Catalan (which is not Spanish) or Castilian
| Spanish?
| bikeshaving wrote:
| Yes the Spanish spoken in Spain, especially the one
| that's like /'grathjas/ and /barthe'lona/.
| djmips wrote:
| But Spanish sounds very different in Spain depending on
| what region of the country you are talking about.
| oscarfree wrote:
| Not sure, could be the large number of Spanish dialects
| represented in the dataset, label noise, or something else.
| There may just be too much diversity in the class to fit neatly
| in a cluster.
|
| Also, the training dataset is highly imbalanced and Spanish is
| the most common class, so the model predicts it as a sort of
| default when it isn't confident -- this could lead to artifacts
| in the reduced 3d space.
| zaouiamine wrote:
| This is a fascinating look at how AI interprets accents! It
| reminds me of some recent advancements in speech recognition
| tech, like Google's Dialect Recognition feature, which also
| attempts to adapt to different accents. I wonder how these models
| could be improved further to not just recognize but also
| appreciate the nuances of regional
| afiodorov wrote:
| Apparently Persian and Russian are close. Which is surprising to
| say the least. I know people keep getting confused about how
| Portuguese from Portugal and Russian sound close yet the Persian
| is new to me.
| zehaeva wrote:
| When I went to Portugal I was struck by how much Portuguese
| there does sound like Spanish with a Russian accent!
| oscarfree wrote:
| Part of this is the "dark L" sound
| BalinKing wrote:
| I'd guess that the sibilants, consonant clusters, and/or
| vowel reduction would play a big role.
| binary132 wrote:
| I thought I was the only one who perceived an audible
| similarity between Portuguese and Russian.
| mh- wrote:
| I speak neither, and both also sound similar to me depending
| on the accents of the speakers.
| djmips wrote:
| I had that too but it was Brazillian Portuguese where I
| noticed it.
| CGMthrowaway wrote:
| Idea: Farsi and Russian both have simple list of vowel sounds
| and no diphtongs. Making it hard/obvious when attempting to
| speak english, which is rife with them and many different vowel
| sounds
| ilyausorov wrote:
| Yeh they seem to be in the same "major" cluster, although
| Serbian/Croatian, Romanian, Bulgarian, Turkish, Polish and
| Czech are all close.
|
| Turkish and Persian seem to be the nearest neighbors.
| zman0225 wrote:
| Going mono-tonal to that of an expressive ebook increased my
| "American English" score from a 52% to 92%.
|
| I'd suggest training a little less on audio books.
| djmips wrote:
| What does it mean mono-tonal and what is an expressive ebook? I
| assume you are not American born? I had been of the
| understanding that rythm was more important than the exact
| sounds in comprehension.
| bikeshaving wrote:
| The source code for this is unminified and very readable if
| you're one of the rare few who has interesting latent spaces to
| visualize.
|
| https://accent-explorer.boldvoice.com/script.js?v=5
| ilyausorov wrote:
| Nothing too secret in there! We anonymized everything and
| anyway it's just a basic Plotly plot. Feel free to check it
| out.
| 3abiton wrote:
| Good catch. I really hate javascript so i never got into d3js,
| so plptly was such a life saver.
| ilyausorov wrote:
| Plotly is great! Much love.
| dcreater wrote:
| whats the dimensionality of the latent space? How were the 3
| dimensions visualized selected?
| oscarfree wrote:
| 12 layers of 768-dim each. The 3 dimensions visualized are
| chosen by UMAP.
| lynchdt wrote:
| Irish accent appears to break it.
| oscarfree wrote:
| We are working on this - we don't have quite enough Irish
| speech data.
| diegolas wrote:
| it would've been nice to be able to visualize the differences
| between the different accents in the spanish language, really
| cool tho
| ilyausorov wrote:
| Yeh, we would've loved to see that too. It's on our roadmap for
| sure. Same for some of the other languages with a large amount
| of unique accents like e.g. French, Chinese, Arabic, etc...
| johnwatson11218 wrote:
| I just got a project running whereby I used python + pdfplumber
| to read in 1100 pdf files, most of my humble bundle collection. I
| extracted the text and dumped it into a 'documents' table in
| postgresql. Then I used sentence transformers to reduce each 1K
| chunk to a single 384D vector which I wrote back to the db. Then
| I averaged these to produce a document level embedding as a
| single vector.
|
| Then I was able to apply UMAP + HDBSCAN to this dataset and it
| produced a 2D plot of all my books. Later I put the discovered
| topic back in the db and used that to compute tf-idf for my
| clusters from which I could pick the top 5 terms to serve as a
| crude cluster label.
|
| It took about 20 to 30 hours to finish all these steps and I was
| very impressed with the results. I could see my cookbooks clearly
| separated from my programming and math books. I could drill in
| and see subclusters for baking, bbq, salads etc.
|
| Currently I'm putting it into a 2 container docker compose file,
| base postgresql + a python container I'm working on.
| mertbozkir wrote:
| i love boldvoice
| ilyausorov wrote:
| Thanks, we love you too
| ccheever wrote:
| Very interesting
___________________________________________________________________
(page generated 2025-10-14 23:00 UTC)