We started indexing gobs of Youtube videos and made the world's most luxurious, diamond encrusted video search engine. The audio is stripped and digested with Deepgram (that's us—hi!). When you search, you're brought right to when people are talking about your precious little search term.
The index is 4 million+ seconds of video and growing.
The search tool worked better than I expected it to. Good job on that. I read on the YC blog[1] that you're using deep learning to index audio - can you explain more about this? Do you have any experiments showing how this is better than generating a transcript of the audio?
Yep, we use deep learning to generate an index of audio features and an approximate search to look through it all. We've done lots of testing on both clear and average/noisy audio. With standard transcription matching you get ~alright results (50% retrieval) on clear audio but really terrible results on average/noisy audio (<20% retrieval). With deepgram you get great results on clear audio (90% retrieval) and still really good results (80% retrieval) on noisy/average quality content.
We're always trying to make the indexing and search better/faster. For Hoogley we had to pull some tricks to get the search to be really fast for a lot of users, so it isn't quite as high quality as it can be. Working on it though ;)
I like the idea (and we have the same retarded sense of humor) but it isn't consistently useful. I searched "hilary bosnia sniper fire" and got nothing.
So spend less resources on your design and more on functionality.
EDIT: Additional feedback: Your product doesn't work. If you test it yourself for a few minutes you'll see that. It is your job to test it, not the world's. You're not "getting it out there" so much as turning people away forever because they will think "I've already tried that; it didn't work". This is kinda like when my programmer employees/contractors tell me that they finished a feature but I know they never bothered to try it because it doesn't work at all.
What?! You don't like click bait-y titles that point to POCs?!! You are totally right, it's a POC. <<slink away>> Just a project we put out really, really early but hope it's fun/interesting.
Indexing the entire web's videos and presenting results well is a big computational+UI problem that we are still working on. We mostly help businesses get value out of their audio/video with our API. Full scale search engine is a ways away.
The index is 4 million+ seconds of video and growing.