Do any of the top models let you pause and think while speaking? I have to speak non-stop to Gemini assitant and ChatGPT, which is very very useless/unnatural for voice mode. Specially for non-english speakers probably. I sometimes have to think more to translate my thoughts to english.
Have you tried talking to ChatGPT in your native tongue? I was blown away by my mother speaking her native tongue to ChatGPT and having it respond in that language. (It's ever so slightly not a mainstream one.)
Magpie-TTS needs a kernel compiled targeting Ampere, but it appears to be closed source. It was compiled for the 2018 T4, but not 2020-2024 consumer cards, just 2025 consumer cards.
I actually forked the repo, modified the Dockerfile and build/run scripts targeting Ampere and the whole setup is running seamlessly on my 3090, Magpie is running fine and using under 3Gb of memory, ~2Gb for nemotron STT, and ~18Gb for Nemotron Nano 30b. Latencies are great and the turn detection works really well!
I'm going to use this setup as the base for a language learning App for my gf :)
Make sure you’re using the nemotron-speech asr model. I added support for Spanish via Canary models but these have like 10x the latency: 160ms on nemotron-speech vs 1.5s canary.
For the LLM I’m currently using Mistral-Small-3.2-24B-Instruct instead of Nemotron 3 and it works well for my use case
https://manpages.ubuntu.com/manpages/trusty/man1/festival.1....
But it is quite old now and pre-dates the DL/AI era.
Does anybody know of a good modern replacement that I can "apt install"?