It helps to comprehend research papers (and not only papers - any document on any language) faster.
The tool is free to use, because we have credits from GCP. I guess at some point we'll need to introduce some level of subscription fee to keep it alive and useful, as it uses LLMs and vector search quite a bit.
Leo Boytsov, my guest on the Vector Podcast, made an honest and an eye-opening claim about vector search being intellectually rewarding, but professionally undervalued. What picked my attention was how he gives credit to people who actually deserve it, and how he speaks modestly about his own achievements. When professionally as a researcher, he accumulated over 1800 citations by now and helped to create the famous HNSW vector search algorithm.
I'm working on the tool, that includes AI. My original target is to test it on my https://www.youtube.com/c/VectorPodcast by offering something that Lex Fridman does for his episodes.
Current features:
1. Download from YT
2. Transcribe using Vosk (output has time codes included)
3. Speaker diarization using pyannote - this isn't perfect and needs a bit more ironing out.
What needs to be done:
4. Store the transcription in a search engine (can include vectors)
5. Implement a webapp
If anyone here is interested to join forces, let me know.
I write about vector search, ANN algorithms, neural search frameworks, search engines and algorithms in general and publish episodes of the Vector Podcast.
It was great to discuss with Jo Kristian on these topics and more:
- History of Vespa
- Tensor data structure and its use cases
- Multi-stage ranking pipeline
- Game-changing vector search in Vespa
- Approximate vs exact nearest neighbor search tradeoffs
- Misconceptions in neural search
- Multimodal search is where vector search shines
- Power of building fully-fledged demos
- How to combine vector search with sparse search: Reciprocal Rank Fusion
- The question of WHY (my favourite)
I wonder what topics are interesting to HN community -- it would help me focus on these topics / embed into my questions in new episodes.
Thanks for the article, I've learnt new search engines despite spending a couple of years recently in web scale search. I think you may consider https://usearch.com/ as another dimension in web scale search, where query log is learnt from the data, making it quite unique.
Great project! Elasticsearch / OpenSearch / Solr have their own learning to rank plugins. Have you considered integrating Metarank with such systems? Or is your vision to provide a reranker layer, that can be independent of the underlying search engine architecture?
We were considering creating a plugin for elasticsearch, but there's already one (ES-LTR) and such architecture limits the ability to create a good multi-purpose system.
We're still considering building plugins to easier integrate with existing search technologies and will keep an open eye on the demand for this.
It helps to comprehend research papers (and not only papers - any document on any language) faster.
The tool is free to use, because we have credits from GCP. I guess at some point we'll need to introduce some level of subscription fee to keep it alive and useful, as it uses LLMs and vector search quite a bit.
Feedback is welcome!