[HN Gopher] Sesame CSM: A Conversational Speech Generation Model
___________________________________________________________________
Sesame CSM: A Conversational Speech Generation Model
Author : tosh
Score : 98 points
Date : 2025-03-14 13:48 UTC (4 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| zhyder wrote:
| Any provider already hosting this (similar to how many providers
| host Whisper for STT)? Looks like doesn't support streaming tho
| (same with Whisper coincidentally), but great to see open models
| get so much better.
| nshm wrote:
| It is useless actually. Very slow and quality is suboptimal and
| it is just speech generation component. See discussion here:
|
| https://github.com/SesameAILabs/csm/issues/80
| learncomputer wrote:
| Neat project! How does it handle different accents or speech
| speeds--does it need a lot of training data for that? Excited to
| see more open-source stuff in this space.
| thehamkercat wrote:
| Turns out it was a rug-pull
|
| They open-sourced a crippled version of sesame (1B)
|
| not the one they're using in actual demo
| drivebyhooting wrote:
| Very disappointing.
| theoryofx wrote:
| Really is. Seems like they lost their nerve. Their
| credibility is in the toilet now. Clearly not good faith
| actors.
|
| Which is a shame because seems like there's a real
| opportunity to really shake things up with an open voice
| model that's competitive with the proprietary ones.
|
| Oh well. Someone else will do what they claimed to want to
| do.
| stevev wrote:
| A16z won't allow it. For them there is Money to be made.
| simonw wrote:
| If you want to try this on Mac this Python library worked for me:
| https://github.com/senstella/csm-mlx
|
| You can run it with uv like this: uv run
| --python 3.12 \ --with
| "git+https://github.com/senstella/csm-mlx[cli]" \ csm-
| mlx --text 'hello there' -o output.wav
| gcr wrote:
| this is great, thanks!!
|
| is there any reason why it inserts multiple-seconds-long
| awkward pauses into the output? are you seeing that behavior in
| your example? (on a 2021 M1 Max MBP)
| simonw wrote:
| Oh interesting, no I haven't spotted that. I'm on an M2, but
| I haven't spent a great deal of time poking at it with longer
| inputs.
| low_tech_punk wrote:
| OpenSeasame would be a great project name!
| cootsnuck wrote:
| This just goes to show that, especially with Voice AI, people
| should be thinking in terms of "systems" not "agents". Sesame
| claims this slow barebones base model is what their demo is built
| around. Regardless of if that's true or not, it is true that
| there's a whole lot more that goes into a slick demo like
| Sesame's Maya and Miles than just hooking up a few models
| together.
| thegrim33 wrote:
| The Sesame demo is really impressive but the fact that they
| record and review your conversations make it a complete non-
| starter for me to actually use it. I can't feel comfortable
| knowing some person could actually listen to everything we say.
| Open sourcing it is great so I could self-host it, although it
| seems like you can't quite get to something similar to the demo
| from this so I'm not sure what the point is.
___________________________________________________________________
(page generated 2025-03-18 23:01 UTC)