[HN Gopher] Transcribro: On-device Accurate Speech-to-text
___________________________________________________________________
Transcribro: On-device Accurate Speech-to-text
Author : thebiblelover7
Score : 46 points
Date : 2024-07-18 17:25 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| crancher wrote:
| Accrescent hype is comically overdone.
| free_bip wrote:
| I looked in the GitHub issues and there's a closed issue for
| F-droid inclusion. The author states that F-droid "Doesn't meet
| their requirements" but doesn't elaborate. I wonder what
| F-droid is missing that they need so much?
| flax wrote:
| Documentation severely lacking. I wanted to know whether this
| does streaming or only batch, as well as examples for integrating
| with Android apps.
| pants2 wrote:
| Considering it uses Whisper, it's probably not streaming
| refulgentis wrote:
| I did some core work on TTS at Google, at several layers, and
| I've never quite understood what people mean by streaming vs.
| not.
|
| In each and every case I'm familiar with, streaming means
| "send the whole audio thus far to the inference engine,
| inference it, and send back the transcription"
|
| I have a Flutter library that does the same flow as this
| (though via ONNX, so I can cover all platforms), and Whisper
| + Silero is ~identical to the interfaces I used at Google.
|
| If the idea is streaming is when each audio byte is only sent
| once to the server, there's still an audio buffer accumulated
| -- its just on the server.
| iamjackg wrote:
| I think in practical terms (at least for me):
|
| - streaming == I talk and the text appears as I talk
|
| - batched == I talk, and after I'm done talking some
| processing happens and the text gets populated
| refulgentis wrote:
| Gotcha, then, it's "not even wrong" in the Pauli sense to
| say Whisper isn't streaming
| flax wrote:
| "streaming" in this case is like another reply said:
| transcriptions appear as I talk. Compared to not-streaming
| in which the service waits for silence, then processes the
| captured speech, then returns some transcription.
|
| Is your Flutter library available? And does it run locally?
| I'm looking for a good Flutter streaming (in the sense
| above) speech recognition library. vosk looks good, but
| it's lacking some configurability such as selecting audio
| source.
| refulgentis wrote:
| FONNX, haven't gone out of my way to make it trivial[1],
| but, it's very good, battle tested on every single
| platform. (And yes runs locally)
|
| [1] example app shows how to do everything, there's basic
| doc, but man the amount of nonsense you need to know to
| pull it all together is just too hard to document without
| a specific Q. Do feel free to file an issue
| james2doyle wrote:
| Looks similar to the new FUTO keyboard:
| https://voiceinput.futo.org/
| iamjackg wrote:
| I've been using this for a while (the voice input, not their
| keyboard) and it's so refreshing to be able to just speak and
| have the output come out as fully formed, well punctuated
| sentences with proper capitalization.
| james2doyle wrote:
| I agree. No more "speaking punctuation". Just talk as normal
| and it comes out fully formed
| freedomben wrote:
| I actually don't mind speaking punctuation, in fact it kind
| of helps. What I really hate is the middle-spot where we
| are right now, where it tries to place punctuation and
| sucks badly at it.
| leobg wrote:
| Anything like that available for iOS?
| yjftsjthsd-h wrote:
| But open source, which is a pretty big difference
| grandma_tea wrote:
| FUTO and Transcribro are open source.
| yencabulator wrote:
| FUTO is not open source.
|
| https://gitlab.futo.org/alex/voiceinput/-/blob/master/LICEN
| S...
|
| > FUTO Source First License 1.0
|
| > You may use or modify the software only for non-
| commercial purposes
| Humbly8967 wrote:
| No, FUTO made a new "Source First License"[1] that is not
| Open Source by the OSI definition.
|
| [1] https://github.com/futo-org/android-
| keyboard/blob/master/LIC...
___________________________________________________________________
(page generated 2024-07-18 23:01 UTC)