Astro - Hacker News

corlinp 5 minutes ago ago

I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.

https://github.com/corlinp/voibe

I do see the name has since been taken by a paid service... shame.

digitalbase an hour ago ago

Was searching for this this morning and settled on https://handy.computer/

[-]

vogtb 25 minutes ago ago

Handy rocks. I recently had minor surgery on my shoulder that required me to be in a sling for about a month, and I thought I'd give Handy a try for dictating notes and so on. It works phenomenally well for most text-to-speech use cases - homonyms included.
zachlatta an hour ago ago

I just learned about Handy in this thread and it looks great!
I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.
This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.
The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.
[-]
- stavros 37 minutes ago ago
  
  As a very happy Handy user, it doesn't do that indeed. It will be interesting to see if it works better, I'll give FreeFlow a shot, thanks!
stavros 38 minutes ago ago

I use handy as well, and love it.
hendersoon an hour ago ago

Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).

p0w3n3d 2 hours ago ago

There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.

https://github.com/Beingpax/VoiceInk

[-]

parhamn an hour ago ago

+1, my experience improved quite a bit when I switched to the parakeet model, they should definitely use that as the default.
zackify 41 minutes ago ago

My favorite too. I use the parakeet model

vesterde 40 minutes ago ago

Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.

Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!

kombinar 2 hours ago ago

Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.

I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.

lemming 44 minutes ago ago

Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.

baxtr 39 minutes ago ago

Is there a tool that preserves the audio? I want both, the transcript and the audio.

[-]

heyalexej 17 minutes ago ago

Quick glance; FreeFlow already saves WAV recordings for every transcript to ~/Lib../App../FreeFlow/audio/ with UUIDs linking them to pipeline history entries in CoreData. Audio files are automatically deleted though, when their associated history entries are deleted. Shall be a quick fix. Recently did the same for hyprvoice, for debugging and auditing.

spelk an hour ago ago

Does anyone know of an effective alternative for Android?

[-]

xnx 10 minutes ago ago

Does the Android keyboard transcription not work for your needs?
jskherman 42 minutes ago ago

Check out the FUTO keyboard or FUTO voice input apps. It only uses the whisper models though so far.

sonu27 2 hours ago ago

Nice! I vibe coded the same this weekend but for OpenAI however less polished https://github.com/sonu27/voicebardictate

[-]

manmal an hour ago ago

Also look into voxtral, their new model is good and half the price if you can live without streaming.

hodanli 21 minutes ago ago

title lacks: for Mac

arcologies1985 2 hours ago ago

Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.

[-]

zachlatta 2 hours ago ago

I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
Some day!
s0l 2 hours ago ago

https://github.com/cjpais/Handy
It’s free and offline
[-]
- zachlatta 2 hours ago ago
  
  Wow, Handy looks really great and super polished. Demo at https://handy.computer/

an hour ago ago

[deleted]

Fidelix 2 hours ago ago

MacOS only. May this help you skip a click.

[-]

spelk an hour ago ago

Whispering [0] is Windows compatible and has gotten a lot better on Windows despite being extremely rough around the edges at first.
[0] https://github.com/EpicenterHQ/epicenter