[HN Gopher] Transcribro: On-device Accurate Speech-to-text
       ___________________________________________________________________
        
       Transcribro: On-device Accurate Speech-to-text
        
       Author : thebiblelover7
       Score  : 46 points
       Date   : 2024-07-18 17:25 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | crancher wrote:
       | Accrescent hype is comically overdone.
        
         | free_bip wrote:
         | I looked in the GitHub issues and there's a closed issue for
         | F-droid inclusion. The author states that F-droid "Doesn't meet
         | their requirements" but doesn't elaborate. I wonder what
         | F-droid is missing that they need so much?
        
       | flax wrote:
       | Documentation severely lacking. I wanted to know whether this
       | does streaming or only batch, as well as examples for integrating
       | with Android apps.
        
         | pants2 wrote:
         | Considering it uses Whisper, it's probably not streaming
        
           | refulgentis wrote:
           | I did some core work on TTS at Google, at several layers, and
           | I've never quite understood what people mean by streaming vs.
           | not.
           | 
           | In each and every case I'm familiar with, streaming means
           | "send the whole audio thus far to the inference engine,
           | inference it, and send back the transcription"
           | 
           | I have a Flutter library that does the same flow as this
           | (though via ONNX, so I can cover all platforms), and Whisper
           | + Silero is ~identical to the interfaces I used at Google.
           | 
           | If the idea is streaming is when each audio byte is only sent
           | once to the server, there's still an audio buffer accumulated
           | -- its just on the server.
        
             | iamjackg wrote:
             | I think in practical terms (at least for me):
             | 
             | - streaming == I talk and the text appears as I talk
             | 
             | - batched == I talk, and after I'm done talking some
             | processing happens and the text gets populated
        
               | refulgentis wrote:
               | Gotcha, then, it's "not even wrong" in the Pauli sense to
               | say Whisper isn't streaming
        
             | flax wrote:
             | "streaming" in this case is like another reply said:
             | transcriptions appear as I talk. Compared to not-streaming
             | in which the service waits for silence, then processes the
             | captured speech, then returns some transcription.
             | 
             | Is your Flutter library available? And does it run locally?
             | I'm looking for a good Flutter streaming (in the sense
             | above) speech recognition library. vosk looks good, but
             | it's lacking some configurability such as selecting audio
             | source.
        
               | refulgentis wrote:
               | FONNX, haven't gone out of my way to make it trivial[1],
               | but, it's very good, battle tested on every single
               | platform. (And yes runs locally)
               | 
               | [1] example app shows how to do everything, there's basic
               | doc, but man the amount of nonsense you need to know to
               | pull it all together is just too hard to document without
               | a specific Q. Do feel free to file an issue
        
       | james2doyle wrote:
       | Looks similar to the new FUTO keyboard:
       | https://voiceinput.futo.org/
        
         | iamjackg wrote:
         | I've been using this for a while (the voice input, not their
         | keyboard) and it's so refreshing to be able to just speak and
         | have the output come out as fully formed, well punctuated
         | sentences with proper capitalization.
        
           | james2doyle wrote:
           | I agree. No more "speaking punctuation". Just talk as normal
           | and it comes out fully formed
        
             | freedomben wrote:
             | I actually don't mind speaking punctuation, in fact it kind
             | of helps. What I really hate is the middle-spot where we
             | are right now, where it tries to place punctuation and
             | sucks badly at it.
        
         | leobg wrote:
         | Anything like that available for iOS?
        
         | yjftsjthsd-h wrote:
         | But open source, which is a pretty big difference
        
           | grandma_tea wrote:
           | FUTO and Transcribro are open source.
        
             | yencabulator wrote:
             | FUTO is not open source.
             | 
             | https://gitlab.futo.org/alex/voiceinput/-/blob/master/LICEN
             | S...
             | 
             | > FUTO Source First License 1.0
             | 
             | > You may use or modify the software only for non-
             | commercial purposes
        
             | Humbly8967 wrote:
             | No, FUTO made a new "Source First License"[1] that is not
             | Open Source by the OSI definition.
             | 
             | [1] https://github.com/futo-org/android-
             | keyboard/blob/master/LIC...
        
       ___________________________________________________________________
       (page generated 2024-07-18 23:01 UTC)