Speaker diarization (labels) for OpenAI Whisper generated transcripts

algon33 · on Dec 30, 2022

I tried using this for a technical talk[1], and it got the amount of speakers wrong. Which is somewhat suprising to me, as I would have thought diarization tech would just worked by now.

[1]https://www.youtube.com/watch?v=5lFxURxbyEc&list=PLiayR7yJx8...

ufarooqi · on Dec 30, 2022

I'm gonna give it a try with your video. If I may ask how many speakers are there in this video. (I have to go through all of it otherwise). From what I can see, we have a teacher who is speaking most of the times and then few laughs from students in the background.

algon33 · on Dec 31, 2022

There are a couple of people interejecting with answers to questions, or asking questions. I'm afraid I don't have a better estiamte than that. But in this case, I think lumping the students together as one speaker and the teacher as another would be fine.

sandkoan · on Dec 30, 2022

Woah! I've been facing the same problems with pyannote+whisper for diarization+transcription, and, coincidentally, was just experimenting with combining NeMO and whisper. Do you happen to have a repo for this? Would be invaluable.

Edit: Nevermind, found the link: https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...

ufarooqi · on Dec 30, 2022

I have attached a link to Google colab with the article.

https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...