I have my own fork here: https://github.com/HorizonXP/voxtral.c where I’m working on a CUDA implementation, plus some other niceties. It’s working quite well so far, but I haven’t got it to match Mistral AI’s API endpoint speed just yet.
Just tried out Handy. This is much better and lightweight UI than the previous solutions I've tried out! I know it wasn't you intention, but thank you for the recommendation!
That said, I now agree with your original statement and really want Voxtral support...
Handy is awesome! and easy to fork. I highly recommend building it from source and submitting PRs if there are any features you want. The author is highly responsive and open to vibe-coded PRs as long as you do a good job. (Obviously you should read the code and stand by it before you submit a PR, but I just mean he doesn't flatly reject all AI code like some other projects do.) I submitted a PR recently to add an onboarding flow to Macs that just got merged, so now I'm hooked
hm, seems broken on my machine (Firefox, Asahi Linux, M1 Pro). I said hello into the mic, and it churned for a minute or so before giving me:
panorama panorama panorama panorama panorama panorama panorama panorama� molest rist moundothe exh� Invothe molest Yan artist��������� Yan Yan Yan Yan Yanothe Yan Yan Yan Yan Yan Yan Yan
I just tried it, I said "what's up buddy, hey hey stop" and it transcribed this for me: " وطبعا هاي هاي هاي ستوب" No, I'm not in any arabic or middle eastern country. The second test was better, it detected english.
If folks are interested, @antirez has opened a C implementation of Voxtral Mini 4B here: https://github.com/antirez/voxtral.c
I have my own fork here: https://github.com/HorizonXP/voxtral.c where I’m working on a CUDA implementation, plus some other niceties. It’s working quite well so far, but I haven’t got it to match Mistral AI’s API endpoint speed just yet.
Awesome work, Would be good to have it work with handy.computer. Also are there plans to support streaming ?
Just tried out Handy. This is much better and lightweight UI than the previous solutions I've tried out! I know it wasn't you intention, but thank you for the recommendation!
That said, I now agree with your original statement and really want Voxtral support...
Handy is awesome! and easy to fork. I highly recommend building it from source and submitting PRs if there are any features you want. The author is highly responsive and open to vibe-coded PRs as long as you do a good job. (Obviously you should read the code and stand by it before you submit a PR, but I just mean he doesn't flatly reject all AI code like some other projects do.) I submitted a PR recently to add an onboarding flow to Macs that just got merged, so now I'm hooked
hm, seems broken on my machine (Firefox, Asahi Linux, M1 Pro). I said hello into the mic, and it churned for a minute or so before giving me:
panorama panorama panorama panorama panorama panorama panorama panorama� molest rist moundothe exh� Invothe molest Yan artist��������� Yan Yan Yan Yan Yanothe Yan Yan Yan Yan Yan Yan Yan
Man, I'd love to fine-tune this, but alas the huggingface implementation isn't out as far as I can tell.
I just tried it, I said "what's up buddy, hey hey stop" and it transcribed this for me: " وطبعا هاي هاي هاي ستوب" No, I'm not in any arabic or middle eastern country. The second test was better, it detected english.
fwiw, that is the right-ish transliteration into arabic. It just picked the wrong language to transcribe to lol
Notable this isn't even close to realtime. M4 Max.
>init failed: Worker error: Uncaught RuntimeError: unreachable
Anything I can do to fix/try it on Brave?
Would check memory, ensure you have free ram. Tested here https://imgur.com/a/3vLJ6no Not perfect dictation, but close enough
Does disabling shields help?