For a product like yours, don't you think a NoSQL solution would be a better fit...

jimktrains2 · on Aug 15, 2017

Not OP, but having structure to a datastore is important. It decreases development and maintenance time and doesn't let vauge bugs fester for years. NoSQL is a premature optimization that many people make without understanding the downsides.

devj · on Aug 15, 2017

Completely agree. But in the above case, the structure of third party service API will always change and each change may require a migration. I worked on a similar problem in my last stint and used Postgres. Here I opted for predefined fields and handled the extra logic in application code. In the above case, I just wanted to know if there is a better way to solve the problem in Postgres.

bjt · on Aug 15, 2017

I get a lot of mileage out of using real columns for the data with fixed structure, and Postgres JSONB columns for the parts of the data whose structure varies.

rorygibson · on Aug 15, 2017

Hi devj, You're absolutely correct that an RDBMS would be an odd solution for a search index *

Anyway, on CTX I use PostgreSQL for storing things that deserve to be relational - job control, batching, users, accounts, billing, invitation codes...

All the indexed content is in an Elastic Search cluster (http://elastic.co if you're unfamiliar - it's a specialised search indexing data store built atop Apache Lucene)

* though actually PostgreSQL has a really good capability as a JSON store that means people do sometimes use it as a NoSQL solution

kuschku · on Aug 15, 2017

PostgreSQL has really good capability not just as JSON store, but also for search and indexing.

Its text search solutions come close to ElasticSearch, with - surprisingly - even better performance.

rpedela · on Aug 15, 2017

> Its text search solutions come close to ElasticSearch, with - surprisingly - even better performance.

PG search may be good enough for certain use cases, but it doesn't come close to the power of Lucene/Solr/ES. PG only recently added support for phrase search which Lucene has had for years. Lucene has extremely flexible analysis pipeline, BM25, "more like this" queries, simple custom ranking, great language support, "did you mean?", autocomplete, etc. ES in particular is built from the ground up as a clustered solution.

As I said PG may be good enough for some use cases, but to claim it is as good as ES is laughable. And PG certainly doesn't make sense if your product is literally a search product.

kuschku · on Aug 15, 2017

All things you mentioned can be done in PGSQL with similar performance, actually.

I have to build a search product in PGSQL, and I’ve done exactly that (and compared it with ES, Solr and Lucene).

For datasets below 50GB the performance is basically the same, you get about the same features, and it works well enough.

But you are right, as soon as you want to build more complicated search products, or as soon as you get larger amounts of data, you want a dedicated solution.

rorygibson · on Aug 15, 2017

tbh I haven't looked into PostgreSQL as a direct competitor to ES in the text search space, mostly because I have quite a lot of ES experience of things like aggregation for faceting, stemming and textual analysis, and I had a (perhaps incorrect) assumption that PostgresSQL was less capable in those more specialised areas.

One thing I would like to understand is how the PostgreSQL search experience scales with data volume. My main cluster for CTX will potentially get (very) big and I don't know enough about PostgreSQL, multi-master and full-text search across large amounts of data.

I'll definitely do some digging though, thanks!

anarazel · on Aug 15, 2017

> tbh I haven't looked into PostgreSQL as a direct competitor to ES in the text search space, mostly because I have quite a lot of ES experience of things like aggregation for faceting, stemming and textual analysis, and I had a (perhaps incorrect) assumption that PostgresSQL was less capable in those more specialised areas.

I think your assumption is/was correct. And I'm a postgres dev, so I'm biased as hell...

kuschku · on Aug 15, 2017

PostgreSQL is very capable at stemming and textual analysis, but the issue as you likely know starts with large amounts of data – because then you suddenly need multiple replicas, and that’s where the issues start.

jacobr1 · on Aug 15, 2017

You might want to look at https://github.com/zombodb/zombodb. It extends PG to use ES as an external source for indexes. So you can do an ES based text search and return a data stored in PG along with doing joins to other tables. Zombo even works with Citus to handle PG level sharding.

devj · on Aug 15, 2017

Thanks @rorygibson. Yes, that design will definitely work. Just one doubt: Are you syncing data between ES and PSQL for common entities like users? If yes, how are you doing it?