Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I no specialist of the field at all, but in the context of Kyutai they explained their workflow a bit to make their speech to speech model. And basically it boils down to: if you want to make a TTS (text to speech) model, you can generate audio track using an STT (speech to text) model, and then you have a supervised audio/text pair. You can even add as much noise to the audio as you want, to make a noise resistant STT model.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: