[HN Gopher] Voxtral - Frontier open source speech understanding ...
       ___________________________________________________________________
        
       Voxtral - Frontier open source speech understanding models
        
       Author : meetpateltech
       Score  : 33 points
       Date   : 2025-07-15 14:47 UTC (8 hours ago)
        
 (HTM) web link (mistral.ai)
 (TXT) w3m dump (mistral.ai)
        
       | danelski wrote:
       | They claim to undercut competitors of similar quality by half for
       | both models, yet they released both as Apache 2.0 instead of
       | following smaller - open, larger - closed strategy used for their
       | last releases. What's different here?
        
         | Havoc wrote:
         | Probably not looking to directly compete in transcription space
        
         | wmf wrote:
         | They're working on a bunch of features so maybe those will be
         | closed. I guess they're feeling generous on the base model.
        
       | homarp wrote:
       | weights:https://huggingface.co/mistralai/Voxtral-Mini-3B-2507 and
       | https://huggingface.co/mistralai/Voxtral-Small-24B-2507
        
         | homarp wrote:
         | Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM
         | in bf16 or fp16.
         | 
         | Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU
         | RAM in bf16 or fp16.
        
       | GaggiX wrote:
       | There is also a Voxtral Small 24B small model available to be
       | downloaded: https://huggingface.co/mistralai/Voxtral-
       | Small-24B-2507
        
       | homarp wrote:
       | Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM
       | in bf16 or fp16.
       | 
       | Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM
       | in bf16 or fp16.
        
       | lostmsu wrote:
       | My Whisper v3 Large Turbo is $0.001/min, so their price
       | comparison is not exactly perfect.
        
         | ImageXav wrote:
         | How did you achieve that? I was looking into it and $0.006/min
         | is quoted everywhere.
        
           | lostmsu wrote:
           | Harvesting idle compute. https://borgcloud.org/speech-to-text
        
             | BetterWhisper wrote:
             | Do you support speaker recognition?
        
       | lostmsu wrote:
       | Does it support realtime transcription? What is the ~latency?
        
       ___________________________________________________________________
       (page generated 2025-07-15 23:00 UTC)