Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also having built a search engine from scratch, we use a similar method: https://insideropinion.com/

In our case the "queries" are also the index creation components. Every time someone discusses something, we are indexing it, so you can search media, documents, people from context. We hint at how this works here: https://austingwalters.com/fast-full-text-search-in-postgres...

The downside of our approach is it needs lots of conversation data. From their TLDR version:

"""

- Our model of a web page is based on queries only. These queries could either be observed in the query logs or could be synthetic, i.e. we generate them. In other words, during the recall phase, we do not try to match query words directly with the content of the page. This is a crucial differentiating factor – it is the reason we are able to build a search engine with dramatically less resources in comparison to our competitors.

- Given a query, we first look for similar queries using a multitude of keyword and word vector based matching techniques.

- We pick the most similar queries and fetch the pages associated with them.

- At this point, we start considering the content of the page. We utilize it for feature extraction during ranking, filtering and dynamic snippet generation.

"""

It appears 0x65 has similarly figured this out, the name of the game is forming proper search queries. In their case, their results would be good as soon as they start indexing and create synthetic queries. IMO might be better for documents and what not.

Either way, interesting to compare notes! Kudos to the work.



I remember reading your article on FTS in postgres, great stuff. Was wondering what strategies you might be using to perform counts on your data?

I'm trying to implement a faceted search in postgres and currently using window functions to count subcategories (a la http://akorotkov.github.io/blog/2016/06/17/faceted-search/), but not sure if it's the most efficient.


Depends what you want to do, I created a "estimate_count" function to make it much much faster:

"SELECT planrows FROM estimate_row('SELECT COUNT(*) ON table WHERE XXX')"


That's actually brilliant.

If you're ever looking for something to write about for a new blog post, I would love to learn more about how you implemented that estimate_count function.

Thanks for the tip in the right direction tho!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: