Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Speaker diarization (labels) for OpenAI Whisper generated transcripts (ufarooqi.com)
44 points by ufarooqi on Dec 30, 2022 | hide | past | favorite | 5 comments


I tried using this for a technical talk[1], and it got the amount of speakers wrong. Which is somewhat suprising to me, as I would have thought diarization tech would just worked by now.

[1]https://www.youtube.com/watch?v=5lFxURxbyEc&list=PLiayR7yJx8...


I'm gonna give it a try with your video. If I may ask how many speakers are there in this video. (I have to go through all of it otherwise). From what I can see, we have a teacher who is speaking most of the times and then few laughs from students in the background.


There are a couple of people interejecting with answers to questions, or asking questions. I'm afraid I don't have a better estiamte than that. But in this case, I think lumping the students together as one speaker and the teacher as another would be fine.


Woah! I've been facing the same problems with pyannote+whisper for diarization+transcription, and, coincidentally, was just experimenting with combining NeMO and whisper. Do you happen to have a repo for this? Would be invaluable.

Edit: Nevermind, found the link: https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...


I have attached a link to Google colab with the article.

https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: