Astro - Hacker News

singpolyma3 44 minutes ago ago

Love this.

It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.

[-]

Buttons840 a minute ago ago

Good question.
If a license says "you may use this, you are prohibited from using this", and I use it, did I break the license?

armcat an hour ago ago

Oh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go!

[1] https://github.com/acatovic/ova

[-]

gropo 36 minutes ago ago

Kokoro is better for tts by far
For voice cloning, pocket tts is walled so I can't tell
amrrs an hour ago ago

Thanks for sharing your repo..looks super cool.. I'm planning to try out. Is it based on mlx or just hf transformers?
[-]
- armcat an hour ago ago
  
  Thank you, just transformers.

dust42 an hour ago ago

Good quality but unfortunately it is single language English only.

[-]

phoronixrly an hour ago ago

I echo this. For a TTS system to be in any way useful outside the tiny population of the world that speaks exclusively English, it must be multilingual and dynamically switch between languages pretty much per word.
Cool tech demo though!
[-]
- Levitz 3 minutes ago ago
  
  But it wouldn't be for those who "speak exclusively English", rather, for those who speak English. Not only that but it's also common to have system language set to English, even if one's language is different.
  There's about 1.5B English speakers in the planet.
- kamranjon 13 minutes ago ago
  
  That's a pretty crazy requirement for something to be "useful" especially something that runs so efficiently on cpu. Many content creators from non-english speaking countries can benefit from this type of release by translating transcripts of their content to english and then running it through a model like this to dub their videos in a language that can reach many more people.

lukebechtel an hour ago ago

Nice!

Just made it an MCP server so claude can tell me when it's done with something :)

https://github.com/Marviel/speak_when_done

tschellenbach an hour ago ago

It's cool how lightweight it is. Recently added support to Vision Agents for Pocket. https://github.com/GetStream/Vision-Agents/tree/main/plugins...

syntaxing an hour ago ago

Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.

[-]

daemonologist 17 minutes ago ago

Parakeet is not really more accurate than Whisper, but it's much faster - faster than realtime even on CPU: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 . You have to use Nemo though, or mess around with third-party conversions. (Also has a big brother Canary: https://huggingface.co/nvidia/canary-1b-v2. There's also the confusingly named/positioned Nemotron speech: https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...)
[-]
- satvikpendem 10 minutes ago ago
  
  Keep in mind Parakeet is pretty limited in the number of languages it supports compared to Whisper.
phoronixrly an hour ago ago

from the other day https://github.com/cjpais/Handy

GaggiX 2 hours ago ago

I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.

Another recent example: https://github.com/supertone-inc/supertonic

[-]

andai 43 minutes ago ago

In-browser demo of Supertonic with WASM:
https://huggingface.co/spaces/Supertone/supertonic-2
coder543 an hour ago ago

Another one is Soprano-1.1.
It seems like it is being trained by one person, and it is surprisingly natural for such a small model.
I remember when TTS always meant the most robotic, barely comprehensible voices.
https://www.reddit.com/r/LocalLLaMA/comments/1qcusnt/soprano...
https://huggingface.co/ekwek/Soprano-1.1-80M
nunobrito 2 hours ago ago

Thank you. Very good suggestion with code available and bindings for so many languages.

oybng 22 minutes ago ago

>If you want access to the model with voice cloning, go to https://huggingface.co/kyutai/pocket-tts and accept the terms, then make sure you're logged in locally with `uvx hf auth login` lol

snvzz an hour ago ago

Relative to AmigaOS translator.device + narrator.device, this sure seems bloated.