Howdy! All of the details about our TSBS settings in the performance section of the docs. Also, we'll be streaming a sample benchmark of the two databases next Wednesday at 10AM ET/4PM CET.
> We tried multiple batch sizes and found that in most cases there was little difference in overall insert efficiency
This is wrong in CH world where batch size matters a lot. I would recommend keep this even more higher around (10x of current value).
Humble Suggestion: There are many things not quite properly interpreted about CH and reading through the blog it seems like you're focusing more on areas which CH is lacking/missing. Please don't do these things.
- The code that TSBS uses was contributed by Altinity[1]. If there is a better setup, please feel free to submit a PR. As stated elsewhere, we did have a former CH engineer review and even updated ClickHouse to the newest version __yesterday__ based on his suggestion to ensure we had the best numbers. (and some queries did improve after upgrading, which are the numbers we presented)
- It seems like you read the article (great job - it was long!!), so I'm sure you understand that we were trying to answer performance and feature questions at a deeper level than almost any benchmark we've seen to date. Many just show a few graphs and walk away. We fully acknowledged that smaller batches are not recommended by CH, but something many (normally OLTP) users would probably have. It matters and nobody (that we know of) has shown those numbers before. And in our test, larger batch sizes do work well, but not to some great magnitude in this one server setup. Did 10k or 20k rows maybe go a little faster for CH? Sometimes yes, sometimes negligible. The illustration was that we literally spent months and hundreds of benchmark cycles trying to understand the nuances.
I think we're pretty clear in the post that CH is a great database for the intended cases, but it has shortcomings just like TimescaleDB does and we tried to faithfully explore each side.
This is what tend to make all vendor benchmarks "benchmarketing" - while many of us fully intend to give a fair shot to other technologies we tend to know best practices for our own software better than "competition"
We used the Time Series Benchmark Suite for all these tests https://github.com/timescale/tsbs. Also, Ryan (post author) will be giving all the config details in a Twitch stream happening next Wednesday. We'll be uploading the video to Youtube immediately afterwards too >>