Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For a product like yours, don't you think a NoSQL solution would be a better fit? If yes, why didn't you opt for it? If no, then why(especially considering the fact that your schemas are always in a state of flux)? PS: This is under the assumption that you have used Postgres for storing all the data from different sources.


Not OP, but having structure to a datastore is important. It decreases development and maintenance time and doesn't let vauge bugs fester for years. NoSQL is a premature optimization that many people make without understanding the downsides.


Completely agree. But in the above case, the structure of third party service API will always change and each change may require a migration. I worked on a similar problem in my last stint and used Postgres. Here I opted for predefined fields and handled the extra logic in application code. In the above case, I just wanted to know if there is a better way to solve the problem in Postgres.


I get a lot of mileage out of using real columns for the data with fixed structure, and Postgres JSONB columns for the parts of the data whose structure varies.


Hi devj, You're absolutely correct that an RDBMS would be an odd solution for a search index *

Anyway, on CTX I use PostgreSQL for storing things that deserve to be relational - job control, batching, users, accounts, billing, invitation codes...

All the indexed content is in an Elastic Search cluster (http://elastic.co if you're unfamiliar - it's a specialised search indexing data store built atop Apache Lucene)

* though actually PostgreSQL has a really good capability as a JSON store that means people do sometimes use it as a NoSQL solution


PostgreSQL has really good capability not just as JSON store, but also for search and indexing.

Its text search solutions come close to ElasticSearch, with - surprisingly - even better performance.


> Its text search solutions come close to ElasticSearch, with - surprisingly - even better performance.

PG search may be good enough for certain use cases, but it doesn't come close to the power of Lucene/Solr/ES. PG only recently added support for phrase search which Lucene has had for years. Lucene has extremely flexible analysis pipeline, BM25, "more like this" queries, simple custom ranking, great language support, "did you mean?", autocomplete, etc. ES in particular is built from the ground up as a clustered solution.

As I said PG may be good enough for some use cases, but to claim it is as good as ES is laughable. And PG certainly doesn't make sense if your product is literally a search product.


All things you mentioned can be done in PGSQL with similar performance, actually.

I have to build a search product in PGSQL, and I’ve done exactly that (and compared it with ES, Solr and Lucene).

For datasets below 50GB the performance is basically the same, you get about the same features, and it works well enough.

But you are right, as soon as you want to build more complicated search products, or as soon as you get larger amounts of data, you want a dedicated solution.


tbh I haven't looked into PostgreSQL as a direct competitor to ES in the text search space, mostly because I have quite a lot of ES experience of things like aggregation for faceting, stemming and textual analysis, and I had a (perhaps incorrect) assumption that PostgresSQL was less capable in those more specialised areas.

One thing I would like to understand is how the PostgreSQL search experience scales with data volume. My main cluster for CTX will potentially get (very) big and I don't know enough about PostgreSQL, multi-master and full-text search across large amounts of data.

I'll definitely do some digging though, thanks!


> tbh I haven't looked into PostgreSQL as a direct competitor to ES in the text search space, mostly because I have quite a lot of ES experience of things like aggregation for faceting, stemming and textual analysis, and I had a (perhaps incorrect) assumption that PostgresSQL was less capable in those more specialised areas.

I think your assumption is/was correct. And I'm a postgres dev, so I'm biased as hell...


PostgreSQL is very capable at stemming and textual analysis, but the issue as you likely know starts with large amounts of data – because then you suddenly need multiple replicas, and that’s where the issues start.


You might want to look at https://github.com/zombodb/zombodb. It extends PG to use ES as an external source for indexes. So you can do an ES based text search and return a data stored in PG along with doing joins to other tables. Zombo even works with Citus to handle PG level sharding.


Thanks @rorygibson. Yes, that design will definitely work. Just one doubt: Are you syncing data between ES and PSQL for common entities like users? If yes, how are you doing it?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: