hngopher.com

       [HN Gopher] Sesame CSM: A Conversational Speech Generation Model
       ___________________________________________________________________
        
       Sesame CSM: A Conversational Speech Generation Model
        
       Author : tosh
       Score  : 98 points
       Date   : 2025-03-14 13:48 UTC (4 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | zhyder wrote:
       | Any provider already hosting this (similar to how many providers
       | host Whisper for STT)? Looks like doesn't support streaming tho
       | (same with Whisper coincidentally), but great to see open models
       | get so much better.
        
         | nshm wrote:
         | It is useless actually. Very slow and quality is suboptimal and
         | it is just speech generation component. See discussion here:
         | 
         | https://github.com/SesameAILabs/csm/issues/80
        
       | learncomputer wrote:
       | Neat project! How does it handle different accents or speech
       | speeds--does it need a lot of training data for that? Excited to
       | see more open-source stuff in this space.
        
       | thehamkercat wrote:
       | Turns out it was a rug-pull
       | 
       | They open-sourced a crippled version of sesame (1B)
       | 
       | not the one they're using in actual demo
        
         | drivebyhooting wrote:
         | Very disappointing.
        
           | theoryofx wrote:
           | Really is. Seems like they lost their nerve. Their
           | credibility is in the toilet now. Clearly not good faith
           | actors.
           | 
           | Which is a shame because seems like there's a real
           | opportunity to really shake things up with an open voice
           | model that's competitive with the proprietary ones.
           | 
           | Oh well. Someone else will do what they claimed to want to
           | do.
        
         | stevev wrote:
         | A16z won't allow it. For them there is Money to be made.
        
       | simonw wrote:
       | If you want to try this on Mac this Python library worked for me:
       | https://github.com/senstella/csm-mlx
       | 
       | You can run it with uv like this:                   uv run
       | --python 3.12 \           --with
       | "git+https://github.com/senstella/csm-mlx[cli]" \           csm-
       | mlx --text 'hello there' -o output.wav
        
         | gcr wrote:
         | this is great, thanks!!
         | 
         | is there any reason why it inserts multiple-seconds-long
         | awkward pauses into the output? are you seeing that behavior in
         | your example? (on a 2021 M1 Max MBP)
        
           | simonw wrote:
           | Oh interesting, no I haven't spotted that. I'm on an M2, but
           | I haven't spent a great deal of time poking at it with longer
           | inputs.
        
       | low_tech_punk wrote:
       | OpenSeasame would be a great project name!
        
       | cootsnuck wrote:
       | This just goes to show that, especially with Voice AI, people
       | should be thinking in terms of "systems" not "agents". Sesame
       | claims this slow barebones base model is what their demo is built
       | around. Regardless of if that's true or not, it is true that
       | there's a whole lot more that goes into a slick demo like
       | Sesame's Maya and Miles than just hooking up a few models
       | together.
        
       | thegrim33 wrote:
       | The Sesame demo is really impressive but the fact that they
       | record and review your conversations make it a complete non-
       | starter for me to actually use it. I can't feel comfortable
       | knowing some person could actually listen to everything we say.
       | Open sourcing it is great so I could self-host it, although it
       | seems like you can't quite get to something similar to the demo
       | from this so I'm not sure what the point is.
        
       ___________________________________________________________________
       (page generated 2025-03-18 23:01 UTC)