hngopher.com

       [HN Gopher] How AI hears accents: An audible visualization of ac...
       ___________________________________________________________________
        
       How AI hears accents: An audible visualization of accent clusters
        
       Author : ilyausorov
       Score  : 120 points
       Date   : 2025-10-14 16:07 UTC (6 hours ago)
        
 (HTM) web link (accent-explorer.boldvoice.com)
 (TXT) w3m dump (accent-explorer.boldvoice.com)
        
       | dereknelson wrote:
       | really fun discovery clicking a dot and hearing the accent. neat
       | visualization, lots to think about!
        
       | tmshapland wrote:
       | Fascinating! How did you decouple the speaker-specific vocal
       | characteristics (timbre, pitch range) from the accent-defining
       | phonetic and prosodic features in the latent space?
        
         | oscarfree wrote:
         | We didn't explicitly. Because we finetuned this model for
         | accent classification, the later transformer layers appear to
         | ignore non-accent vocal characteristics. I verified this for
         | gender for example.
        
       | JakeLester wrote:
       | Thank you for sharing! the 3d visual was an interesting
       | application of the UMAP technique.
       | 
       | Is there a way to subscribe to these blog posts for auto-
       | notification?
        
         | nosrepa wrote:
         | Yeah, if only there was a protocol for that.
        
           | bheadmaster wrote:
           | It would have taken you a second more to type out "RSS", and
           | turn a sarcastic comment into an informative one.
           | 
           | Obligatory xkcd: https://xkcd.com/1053/
        
       | ahstilde wrote:
       | why is spanish so distributed?
        
         | ilyausorov wrote:
         | Good question! It's likely because there are lots of different
         | accents of Spanish that are distinct from each other. Our
         | labels only capture the native language of the speaker right
         | now, so they're all grouped together but it's definitely on our
         | to-do list to go deeper into the sub accents of each language
         | family!
        
           | bikeshaving wrote:
           | Spanish is one of those languages I would love to see as a
           | breakdown by country. I'm sure Chilean Spanish looks very
           | different from Catalonian Spanish.
        
             | rkomorn wrote:
             | Did you mean Catalan (which is not Spanish) or Castilian
             | Spanish?
        
               | bikeshaving wrote:
               | Yes the Spanish spoken in Spain, especially the one
               | that's like /'grathjas/ and /barthe'lona/.
        
               | djmips wrote:
               | But Spanish sounds very different in Spain depending on
               | what region of the country you are talking about.
        
         | oscarfree wrote:
         | Not sure, could be the large number of Spanish dialects
         | represented in the dataset, label noise, or something else.
         | There may just be too much diversity in the class to fit neatly
         | in a cluster.
         | 
         | Also, the training dataset is highly imbalanced and Spanish is
         | the most common class, so the model predicts it as a sort of
         | default when it isn't confident -- this could lead to artifacts
         | in the reduced 3d space.
        
       | zaouiamine wrote:
       | This is a fascinating look at how AI interprets accents! It
       | reminds me of some recent advancements in speech recognition
       | tech, like Google's Dialect Recognition feature, which also
       | attempts to adapt to different accents. I wonder how these models
       | could be improved further to not just recognize but also
       | appreciate the nuances of regional
        
       | afiodorov wrote:
       | Apparently Persian and Russian are close. Which is surprising to
       | say the least. I know people keep getting confused about how
       | Portuguese from Portugal and Russian sound close yet the Persian
       | is new to me.
        
         | zehaeva wrote:
         | When I went to Portugal I was struck by how much Portuguese
         | there does sound like Spanish with a Russian accent!
        
           | oscarfree wrote:
           | Part of this is the "dark L" sound
        
             | BalinKing wrote:
             | I'd guess that the sibilants, consonant clusters, and/or
             | vowel reduction would play a big role.
        
         | binary132 wrote:
         | I thought I was the only one who perceived an audible
         | similarity between Portuguese and Russian.
        
           | mh- wrote:
           | I speak neither, and both also sound similar to me depending
           | on the accents of the speakers.
        
           | djmips wrote:
           | I had that too but it was Brazillian Portuguese where I
           | noticed it.
        
         | CGMthrowaway wrote:
         | Idea: Farsi and Russian both have simple list of vowel sounds
         | and no diphtongs. Making it hard/obvious when attempting to
         | speak english, which is rife with them and many different vowel
         | sounds
        
         | ilyausorov wrote:
         | Yeh they seem to be in the same "major" cluster, although
         | Serbian/Croatian, Romanian, Bulgarian, Turkish, Polish and
         | Czech are all close.
         | 
         | Turkish and Persian seem to be the nearest neighbors.
        
       | zman0225 wrote:
       | Going mono-tonal to that of an expressive ebook increased my
       | "American English" score from a 52% to 92%.
       | 
       | I'd suggest training a little less on audio books.
        
         | djmips wrote:
         | What does it mean mono-tonal and what is an expressive ebook? I
         | assume you are not American born? I had been of the
         | understanding that rythm was more important than the exact
         | sounds in comprehension.
        
       | bikeshaving wrote:
       | The source code for this is unminified and very readable if
       | you're one of the rare few who has interesting latent spaces to
       | visualize.
       | 
       | https://accent-explorer.boldvoice.com/script.js?v=5
        
         | ilyausorov wrote:
         | Nothing too secret in there! We anonymized everything and
         | anyway it's just a basic Plotly plot. Feel free to check it
         | out.
        
         | 3abiton wrote:
         | Good catch. I really hate javascript so i never got into d3js,
         | so plptly was such a life saver.
        
           | ilyausorov wrote:
           | Plotly is great! Much love.
        
       | dcreater wrote:
       | whats the dimensionality of the latent space? How were the 3
       | dimensions visualized selected?
        
         | oscarfree wrote:
         | 12 layers of 768-dim each. The 3 dimensions visualized are
         | chosen by UMAP.
        
       | lynchdt wrote:
       | Irish accent appears to break it.
        
         | oscarfree wrote:
         | We are working on this - we don't have quite enough Irish
         | speech data.
        
       | diegolas wrote:
       | it would've been nice to be able to visualize the differences
       | between the different accents in the spanish language, really
       | cool tho
        
         | ilyausorov wrote:
         | Yeh, we would've loved to see that too. It's on our roadmap for
         | sure. Same for some of the other languages with a large amount
         | of unique accents like e.g. French, Chinese, Arabic, etc...
        
       | johnwatson11218 wrote:
       | I just got a project running whereby I used python + pdfplumber
       | to read in 1100 pdf files, most of my humble bundle collection. I
       | extracted the text and dumped it into a 'documents' table in
       | postgresql. Then I used sentence transformers to reduce each 1K
       | chunk to a single 384D vector which I wrote back to the db. Then
       | I averaged these to produce a document level embedding as a
       | single vector.
       | 
       | Then I was able to apply UMAP + HDBSCAN to this dataset and it
       | produced a 2D plot of all my books. Later I put the discovered
       | topic back in the db and used that to compute tf-idf for my
       | clusters from which I could pick the top 5 terms to serve as a
       | crude cluster label.
       | 
       | It took about 20 to 30 hours to finish all these steps and I was
       | very impressed with the results. I could see my cookbooks clearly
       | separated from my programming and math books. I could drill in
       | and see subclusters for baking, bbq, salads etc.
       | 
       | Currently I'm putting it into a 2 container docker compose file,
       | base postgresql + a python container I'm working on.
        
       | mertbozkir wrote:
       | i love boldvoice
        
         | ilyausorov wrote:
         | Thanks, we love you too
        
       | ccheever wrote:
       | Very interesting
        
       ___________________________________________________________________
       (page generated 2025-10-14 23:00 UTC)