[HN Gopher] Voxtral - Frontier open source speech understanding ...
___________________________________________________________________
Voxtral - Frontier open source speech understanding models
Author : meetpateltech
Score : 33 points
Date : 2025-07-15 14:47 UTC (8 hours ago)
(HTM) web link (mistral.ai)
(TXT) w3m dump (mistral.ai)
| danelski wrote:
| They claim to undercut competitors of similar quality by half for
| both models, yet they released both as Apache 2.0 instead of
| following smaller - open, larger - closed strategy used for their
| last releases. What's different here?
| Havoc wrote:
| Probably not looking to directly compete in transcription space
| wmf wrote:
| They're working on a bunch of features so maybe those will be
| closed. I guess they're feeling generous on the base model.
| homarp wrote:
| weights:https://huggingface.co/mistralai/Voxtral-Mini-3B-2507 and
| https://huggingface.co/mistralai/Voxtral-Small-24B-2507
| homarp wrote:
| Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM
| in bf16 or fp16.
|
| Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU
| RAM in bf16 or fp16.
| GaggiX wrote:
| There is also a Voxtral Small 24B small model available to be
| downloaded: https://huggingface.co/mistralai/Voxtral-
| Small-24B-2507
| homarp wrote:
| Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM
| in bf16 or fp16.
|
| Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM
| in bf16 or fp16.
| lostmsu wrote:
| My Whisper v3 Large Turbo is $0.001/min, so their price
| comparison is not exactly perfect.
| ImageXav wrote:
| How did you achieve that? I was looking into it and $0.006/min
| is quoted everywhere.
| lostmsu wrote:
| Harvesting idle compute. https://borgcloud.org/speech-to-text
| BetterWhisper wrote:
| Do you support speaker recognition?
| lostmsu wrote:
| Does it support realtime transcription? What is the ~latency?
___________________________________________________________________
(page generated 2025-07-15 23:00 UTC)