Write performance is a much simpler metric than query performance, which is HIGH...

buremba · on Aug 11, 2017

I mentioned about the benchmark repo because I wanted to learn why you usually advertise on the write performance instead of query performance. The benchmark repo shares the results for write performance but not query performance. Later on, I saw the benchmark for query part in your blog post, which was great.

I agree that full-table scan is not common in time-series use-case and you can't improve the performance in that case unless you use a different storage format. The confusing part for me is that if I have 100B rows, I would probably use a distributed (multi-node) solution unless the dataset includes 50 years of data and I want to query the last week because Postgresql is not good enough when aggregating huge amount of datasets.

Do you have any plan to release distributed version (the chunks may be distributed among the nodes in cluster) or implement columnar storage format?

mfreed · on Aug 11, 2017

Yes, we're working on a distributed version of Timescale as you describe.

But two clarifications:

1. It can aggregate better than you might think. We've had people run single-node Timescale with 20+ disks, then couple that with query parallelization, and you can do pretty good aggregation over larger datasets.

Plus because the way the data is partitioned, a GROUPBY will actually get good localization over the disjoint data (i.e., groups can be local to a chunk) and generate more efficient plans given the smaller per-chunk indexes.

(And the various cloud platforms make it really easy to attach many disks to a single machine. Our our published benchmarking is on network-attached SSDs.)

2. You can use read-only clustering today, i.e., with standard Postgres synchronous or asynchronous replication. So you can scale your query rates with the replicas as well.

buremba · on Aug 11, 2017

Thanks for the clarification.

1. Do you use Postgresql 9.6 query parallelization (https://www.postgresql.org/docs/9.6/static/parallel-plans.ht...) or your own method for processing chunks parallelly? When we have >1B rows with >20 columns, the IO usually becomes a huge the bottleneck in our experience. If you use multiple disks and parallelize the work among different CPU cores, it would help I guess.

mfreed · on Aug 11, 2017

Currently support 9.6 query parallelization. Also considering extending with some of our own methods on chunks as well.

(Timescale supports multiple disks either through RAID or tablespaces. Unlike PG, you can add multiple tablespaces to a single hypertable.)

Happy to also go into more details on Slack (https://slack-login.timescale.com) or email.