Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Write performance is a much simpler metric than query performance, which is HIGHLY dependent on the actual query being performed. Plus, in many time-series settings, you actually need to support high-write rates, which vanilla RDBMS tables can't support.

On the query side, we find that most queries to a time-series DB actually include a time predicate, LIMIT clause, etc. It's pretty rare that you do a full table scan over the 100B rows. (And for these types of broad scans, performance depends on # disks and use of query parallelization.)

Not sure I understand the comment about the benchmark repo doesn't include the performance comparison? That repo is meant to accompany a blog post, which discusses the results (https://blog.timescale.com/timescaledb-vs-6a696248104e), while the repo allows you to replicate our results.



I mentioned about the benchmark repo because I wanted to learn why you usually advertise on the write performance instead of query performance. The benchmark repo shares the results for write performance but not query performance. Later on, I saw the benchmark for query part in your blog post, which was great.

I agree that full-table scan is not common in time-series use-case and you can't improve the performance in that case unless you use a different storage format. The confusing part for me is that if I have 100B rows, I would probably use a distributed (multi-node) solution unless the dataset includes 50 years of data and I want to query the last week because Postgresql is not good enough when aggregating huge amount of datasets.

Do you have any plan to release distributed version (the chunks may be distributed among the nodes in cluster) or implement columnar storage format?


Yes, we're working on a distributed version of Timescale as you describe.

But two clarifications:

1. It can aggregate better than you might think. We've had people run single-node Timescale with 20+ disks, then couple that with query parallelization, and you can do pretty good aggregation over larger datasets.

Plus because the way the data is partitioned, a GROUPBY will actually get good localization over the disjoint data (i.e., groups can be local to a chunk) and generate more efficient plans given the smaller per-chunk indexes.

(And the various cloud platforms make it really easy to attach many disks to a single machine. Our our published benchmarking is on network-attached SSDs.)

2. You can use read-only clustering today, i.e., with standard Postgres synchronous or asynchronous replication. So you can scale your query rates with the replicas as well.


Thanks for the clarification.

1. Do you use Postgresql 9.6 query parallelization (https://www.postgresql.org/docs/9.6/static/parallel-plans.ht...) or your own method for processing chunks parallelly? When we have >1B rows with >20 columns, the IO usually becomes a huge the bottleneck in our experience. If you use multiple disks and parallelize the work among different CPU cores, it would help I guess.


Currently support 9.6 query parallelization. Also considering extending with some of our own methods on chunks as well.

(Timescale supports multiple disks either through RAID or tablespaces. Unlike PG, you can add multiple tablespaces to a single hypertable.)

Happy to also go into more details on Slack (https://slack-login.timescale.com) or email.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: