More war stories: found Timescale easier to setup (maybe just because more familiar), but raw query perf is not something you just magically get for free. Timescale requires a lot of investment in planning. In one project we had simple time range scan queries against a less-than-RAM-sized table taking tens of seconds to complete.
ClickHouse has a bit more ops overhead, but requires very little in the way of pre-planning. Just throw whatever you want at it and it seems to sing by default.
Regarding ops overhead, ClickHouse also has a "local" mode where you can query a huge range of compressed formats without ever performing any kind of ETL step. That means queries can run e.g. directly against S3 storage. For batch logs analysis, IMHO local is a gamechanger. Most bulk logging systems produce massive amounts of S3 objects, and ClickHouse lets you tear through these infrequently (but at high speed) when desired without any bulky/opsey ETL step, and no cluster running idle just waiting for a handful of daily jobs to arrive.
(I love this style of system in general, but the clear engineering work behind ClickHouse completely won me over, I'd apply it anywhere I could)
Our anecdata: we store telemetry per thing. After loading a month worth of data - timescaldb as hosted by their cloud ran a difference aggregation in seconds. Clickhouse routinely did it in 20 millis.
Simple avg, etc were better, but always clickhouse was an order of magnitude faster than timescale. We didn't invest a whole bunch into optimization other than trying some indexing strategies in timescaledb.
Thanks for the feedback. Without knowing your situation, one of the things we show in the blog post is that TimescaleDB compression often changes the game on those kinds of queries (data is transformed to columnar storage when you compress). You don't mention if you did that or not, but it's something we've seen/noticed in every other benchmark at this point - that folks don't enable it for the benchmark.
And second point of the article is that you have lots of options for whatever works in your specific situation. But, make sure you're using the chosen database features before counting it out. :-)
I wonder if it's worth taking a page out of the MongoDB book and enabling these kinds of benchmark altering settings by default. We certainly selected clickhouse over tailscale internally because of major performance differences in our internal testing that might have gone the other way had we "known better".
From my experience of benchmarking these databases on scientific data (highly regular timeseries) and looking at the internals of both, these kinds types of number happen when answering the query needs crunching through many rows, but the output has few. i.e. the queries are filtering and/or aggregating a ton of input rows, that can't be excluded by indexes or queried from preaggregations.
From what I can tell it comes down to execution engine differences. TimeScale, even with compressed tables, uses a row by row execution engine architecturally resembling IE6 era JS engines. ClickHouse uses a batched and vectorized execution engine utilizing SIMD. Difference is one to two orders of magnitude of throughput in terms raw number of rows per core pushed through the execution engine.
Postgres/TimeScale could certainly also implement a similar model of execution, but to call it an undertaking would be an understatement considering the breadth and extensibility of features that the execution engine would need to support. To my knowledge no one is seriously working on this outside of limited capability hacks like vops or PG-Strom extensions.
You do a great job summarizing some of the benefits of ClickHouse we mentioned in the post, including the vectorized engine!
That said, I'm not sure I'd refer to PostgreSQL/TimescaleDB engine architecture as resembling IE6 JS support. Obviously YMMV, but every release of PG and TimescaleDB bring new advancements to query optimizations for the architecture they are designed for, which was the focus of the post.
I'm personally still impressed, after 20+ years of working with SQL, relational databases, when any optimization engine can use statistics to find the "best" plan among (potentially) thousands in a few ms. Maybe I'm too easily impressed. :-D
The optimization engine is of course great (despite occasionally missing hard), but I am not referring to it. I am referring to the way that PostgreSQL executes query plans, the way rows are pulled up the execution tree, is very similar to first iterations JavaScript engines - a tree based interpreter. Picking out columns from rows and evaluating expressions used to work the same until PG11, where we got a bytecode based interpreter and a JIT for those. But so far rows are still working the same way, and it hurts pretty bad when row lookup is cheap and the rows end up either thrown away or aggregated together with basic math.
With TimescaleDB compression, 1000 rows of uncompressed data are compressed into column segments, moved to external TOAST pages, and then pointers to these column segments are stored in the table's "row" (along with other statistics, including some common aggregates).
So while the query processor might still be "row-by-row", each "row" it processes actually corresponds to a column segment for which parallelization/vectorization is possible. And because these column segments are TOASTed, the row itself are just pointers, and you only need to read in those compressed column segments that you are actually SELECTing.
Anyway, might have known this, just wanted to clarify. Thanks for discussion!
yeah very interesting, i was wondering how timescale pushed postgres more towards columnar without rewriting a bunch of postgres itself.
My understanding of TOAST is that it itself is just a bunch of rows in a toast table that split the compressed "row" or in this case "1000 rows of 1 column" across as many rows as required to store the data whilst remaining within the postgres page size limits (normally 8kb).
With the often quoted postgres per row overhead of 23 bytes~ which you would have to pay for each TOAST row as well, does this not add up and eat into your storage efficiencies? or does compression work so well that the 23 bytes x N rows (1 row pointing to toast + N toast rows) required to store the "row" isn't important?
Does timescale do it’s own compression alg too? I see in pg 14 toast column compression can be lz4 instead of ootb pglz which has a few probs appr, I see mentions on the mailing list of significant possible optimizations. When dealing with EBS style storage where read latencies can be multi millis compression is always going to be a win, but is an easy optimization either way I’d think.
Timescale implements its own compression algorithms. It includes several ones, and automatically applies the choice of algorithm based on the data types of columns.
- Gorilla compression for floats
- Delta-of-delta + Simple-8b with run-length encoding compression for timestamps and other integer-like types
- Whole-row dictionary compression for columns with a few repeating values (+ LZ compression on top)
- LZ-based array compression for all other types
This means within even the same table, different columns will be compressed using different algorithms based on their type (or inferred entropy).
Was this for your primary source-of-truth, or more of a downstream data warehouse, or something else?
I'm struggling to imagine a case where these are the two things being considered; Timescale is the obvious choice for a primary database, Clickhouse the obvious choice for a warehouse. I wouldn't let my user-facing app write to Clickhouse, and while I could potentially get away with a read-only Timescale replica for internal-facing reports I would expect to eventually outgrow that and reach for Clickhouse/Snowflake/Redshift.
has some good thoughts. The main thing you'll likely need is some sort of a buffer layer so you can do bulk inserts. Do not write a high-volume of single-row inserts into Clickhouse.
Thanks for sharing the link! I’ve heard the bulk insert thing before and to be honest I’ve always thought that RDBMSs don’t love single row inserts either. Seems clickhouse takes that to a new level.
In our case we are using sqs and usually insert 20-100 rows into the db at a time so I’m going to benchmark how that does in clickhouse.
Rows inserted into the buffer table are then flushed to the normal/base table when one of the limits (defined when the buffer table is created) is reached (limits are max rows, max bytes, max time since the last flush), or when you drop the buffer table.
I'm using it and it works (performance difference can be huge compared to perform single inserts directly into a real/normal table), but be careful - the flushed rows don't give a guarantee of which row is flushed in which sequence, so using a buffer table is a very bad idea if your base table is something which relies on the correct sequences of rows that it receives.
I suppose it depends what you're going to let your user do, but OLAPs in general and Clickhouse in particular don't do well under row-oriented workloads, as described in the post here. I'm imagining users primarily operating on small numbers of rows and sometimes making updates to or deleting them, a worst-case scenario for Clickhouse but best-case for an OLTP like Postgres.
Ah totally. Thanks for sharing your thoughts! In my case I’m evaluating clickhouse as a source of truth for customer telemetry data. Totally agree about the OLTP limitations.
(Remember that clickhouse is not reliable. It doesn’t pretend to be.
Clickhouse is great for lots of common query workloads, but if losing your data would be a big deal then it makes a lot of sense to have your data in a reliable and backed up place (eg timescale or just s3 files or whatever) too.
Of course lots of times people chuck stuff into clickhouse and it’s fine if they lose a bit sometimes. YMMV.)
I have not found this to be the case. Like any system you need to take precautions (replicas and sharding) to ensure no data loss, but I didn't find that to be challenging. In what way have you found ClickHouse particularly risky in this way?
It’s basic computer science. Clickhouse doesn’t fsync etc.
Clickhouse (and other systems with the same basic architecture, like elastic search and, shudder, mongodb) work very well on happy path. They are not advertising themselves as ACID.
Agreed, I wouldn't use Clickhouse for usual warehouse stuff either, mostly because I can't imagine it plays well with dbt which is a non-starter these days.
I'd still argue Clickhouse is closer to Snowflake/Redshift than anything OLTP, and their name is intentionally chosen to evoke warehouse-like scenarios.
Is your comment on ClickHouse and DBT based on using the DBT ClickHouse plugin? [0] If so I would be very interested in understanding what you or others see as deficiencies.
How many data points were those aggregations being computed over? How much memory does your Postgres server have, and are you using SSD storage (with associated postgres config tweaks)?
ClickHouse shines at scales that timescale has no hope of ever supporting. Hence the choice of workloads in the test. Cloudflare was ingesting 6,000,000 rows per second into 36 node (dual E5-2630) ClickHouse cluster (2018) was something like 20PB of data per year.
6,000,000 rows inserted per second is great! And if you need that for your workload, then you probably should choose ClickHouse over TimescaleDB (well at least, for now ;-)
The reason we don't include that in the benchmark is that most developers _don't need_ 6,000,000 rows inserted per second.
And also - that performance doesn't come for free, but requires giving up a lot of things that most developers may need: e.g., no transactions, immutable tables (can't easily update / delete data), SQL-like but not quite SQL query language, inefficient joins, inefficient for point queries retrieving single rows by their keys, etc. (We go into much more detail in the blog post.)
So it comes down to the fundamental analogy used in the post: Do you need a car (versatility) or a bull dozer (narrow specialization)?
If the answer is that you need to support 6,000,000 rows inserted per second, then by all means, choose the bull dozer.
> ClickHouse shines at scales that timescale has no hope of ever supporting.
I'm not sure if this was a throwaway line, or if it was the result of a detailed analysis of TimescaleDB's architecture, but if you don't mind, I'll share this: with TimescaleDB multi-node [0] we are getting close to that performance, and the product keeps getting better.
Those are great, impressive numbers. We certainly don't claim to be all things to all people, but the benchmark was run using single instances mostly because that is what most other benchmarks published by others have done.
With a multi-node TimescaleDB cluster, ingest does literally scale to millions of rows/second and we have a numerous users achieving great numbers. One Fortune 100 company has a 25+ TimescaleDB cluster backing their network monitoring stack and loves it for their use case.
At some point, when we can, I'm sure we'll start to do more with multi-node benchmarking too to give some input to the conversation.
and then it will come down to spec of nodes actual fields and so on etc. Also batch size obviously plays a big role here as
CH is optimized for very large batch sizes and benchmark is not really using that kind of batch size. BTW. I am not involved with CH but any kind of vendor benchmarking their wears will always select params that will make their offering look good
Sure, these tests were not using really large batch sizes because of the other benchmarks we were trying to replicate (but with more detail). Honestly, for this single instance setup, we saw improvement in CH when we went from (say) 5k, 10k, or 20k batches. But it was a few percentage points at a time, not a magnitude different. I'm sure things changes with a cluster setup too, that just wasn't the focus of this post.
Interestingly, we were just testing a multi-node TimescaleDB cluster the other day and found that 75k rows/batch was the optimal size as nodes increased.
So you're completely correct. I tried to be very clear that we were not intentionally "cooking the books" and there's surely other optimizations we could have made. Most of the suggestions so far, however, require further setup of CH features that haven't been used in other benchmarks, so we tried to over communicate our strategy and process.
We also fully acknowledged in the post that an siloed "insert", wait, then "query" test is not real world. But, it's the current way TSBS has been used and other DB engines have come along and used the methodology for now. Maybe that process will change in time to come with other contributions.
BTW, we'll discuss some of this next week during the live-stream and the video will be available after.
That’s a really thorough comparison. Much more detailed than I expected.
From what I see, the trade off in disk space usage would point me toward Timescale for most of my workloads. The insert performance tradeoff just wouldn’t justify the difference for me.
Thanks for the compliment! It's becoming a habit with us and benchmarks. We just really want to dig in and understand what's going on and why things work the way they do. ;-)
There really are so many nuances and as we tried to say a number of times, ClickHouse is really great at what it does well. But it's still OLAP at heart (which precludes OLTP features many apps take for granted) and after enabling TimescaleDB compression, the query story isn't as cut and dry. We don't claim that TimescaleDB is the fastest in all circumstances, or that it absolutely has to be for every workload. Features and versatility play a major part in the decision.
Great question. Yes, eventually it does, but (at least for now) it wasn't something we could reliably force as part of the query cycle and know everything was in it's "best" state with ClickHouse. To be honest, we didn't provide the final compressed size of either database because of the need to wait.
The code that's currently used by TSBS was submitted by Altinity, a heavy supporter of ClickHouse in the U.S., but TSBS is open source and anyone is welcome to contribute and make the process/test better!
May be worth pointing that out in the article since the increased disk usage has been mentioned multiple times in the article without any indication that it's only temporary until ClickHouse merges the parts.
Concerning Clickhouse, yes it does - exactly the same thing e.g. as when you have 2 compressed files containing each 100 sorted rows, when you merge those 200 rows into a single file, sort them and compress them, the result will be smaller than the sum of the 2 separate files.
How much you save is again exactly the same as when dealing directly with files: it depends on the data and on the compression algo.
Actually Altinity is the one that contributed the bits to TSBS for benchmarking ClickHouse[1], so we are using the work that they contributed (and anyone is welcome to make a PR for updates or changes). We also had a former ClickHouse engineer look at the setup to verify it matched best practices with how CH is currently designed, given the TSBS dataset.
As for the optimizations in the article you pointed to from 2019 (specifically how to query "last point" data more efficiently in ClickHouse), it uses a different table type (AggregatedMergeTree) and a materialized view to get better query response times for this query type.
We (or someone in the community) could certainly add that optimization to the benchmark, but it wouldn't be using raw data - which we didn't think was appropriate for the benchmark analysis. But if one wanted to use that optimization, then one should also use Continuous Aggregates for TimescaleDB - ie for an apples to apples comparison - which I think would also lead to similar results to what we show today.
It's actually something we've talked about adding to TSBS for TimescaleDB (as an option to turn on/off) and maybe other DBs could do the same.
I think the most important thing is Clickhouse is NOT designed for small batch insertion, if you need to do 1000s of Inserts/sec you do queue in front of clickhouse. And query speed can be impacted by batch side a lot. So have you looked at query performance with optimal batch size ?
Yep! The blog post includes data and graphs from both large (5000-15,000 rows / batch) and small (100-500 rows / batch) sizes. Please see the section "Insert Performance". Thanks!
1) This is also small batch size. If you're inserting 500.000 rows/sec 5000 rows is not particularly large batch size
2) I see different graphs for ingest but not for queries. The data layout will depend on the batch size, unless of course you did OPTIMIZE before running queries
1) you're absolutely right. 5k rows isn't "large". We also mentioned that we did hundreds of tests often going between 5k and 15k rows/batch. The overall ingest/query cycle didn't change dramatically in any of these. That is, 5k rows was within a few percentage of 10k rows. Interestingly, the benchmarks that Altinity has, only used 10k rows/sec (which we also did, it just didn't have any major impact in the grand scheme of things).
2) We did not specifically call OPTIMIZE before running queries. Again, learning from the leaders at Altinity and their published benchmarks, I don't see any references that they did either, and neither does the TSBS code appear to call it after ingest.
Happy to try both of these during our live stream next week to demonstrate and learn!
Horizontal scalability and compression are also unbeatable from what I've seen, to name a few.
There's a hefty price tag, however: ClickHouse is quite ops heavy and its observability has a seriously steep learning curve. Only go for ClickHouse in production if you really know what you're doing :)
Could you explain what you mean by ops heavy? Just curious. Actually have a production system where Timescale as well as clickhouse are running in parallel. So far clickhouse didn't do any trouble, but it is rarely used right now.
It is not ops heavy. I believe they are trying to sell you on a managed service with that sentence. It is in your best interest for your team to learn how to maintain ClickHouse themselves. The k8s operator will take you far. By the time you need multi-node clusters you will already have a good idea of how it works. If you need a multi cluster deployment from the start then go with some consulting service I guess, but even then I wouldn’t call it “ops heavy”; the clusters are pretty homogeneous, just a bunch of nodes talking over ZK.
It would be awesome to combine the following things:
* PostGIS
* Timescale
* Citus
* Zedstore
This truly would be the relational DB to end all relational DBs. Unfortunately, we run into a couple problems:
* Managing multiple extensions is a burdensome task, which should be in the wheelhouse of cloud providers, but...
* Timescale and Citus are are open core, holding back features for customers. Their primary revenue channels are their cloud offerings. Unfortunately you can't get Citus and Timescale in the same cloud offering, cause you're dealing with two separate companies.
* PostGIS has multiple cloud providers, but none of them have Timescale or Citus available.
* Citus only has cloud offerings on Azure, excluding the other two major players that often have exclusive relationships with companies.
* Zedstore is really cool and together with Citus could be a massive gamechanger by having columnstore and rowstores in the same distributed database. However, development has stalled, and nobody seems to be able to explain what happened.
> Timescale and Citus are are open core, holding back features for customers.
One clarification. While TimescaleDB is open-core, our community version is source-available and 100% free to use. We do not "hold back features for customers". You do not need to pay to use any of TimescaleDB's best features, it's all free via the Timescale Community license.
You only pay if you'd like to use our hosted offerings (and save the hassle of self-managing your DB): Timescale Cloud or Managed Service for TimescaleDB.
Timescale Cloud indeed comes with PostGIS installed by default.
Regarding distributed (Citus) and columnar (Zedstore):
- TimescaleDB's compression actually takes a columnar approach (including that it only reads the individual compressed columns that you SELECT), and so combines both row- and column-oriented data. [0]
- TimescaleDB 2.0 also supports distributed deployment, and Timescale Cloud will (very soon) offer one-click deployment of fully-managed multi-node TimescaleDB. [1]
If you use Aiven for a cloud PG instance you can do both Timescale + PostGIS installed.
I also really wish ClickHouse would prioritize PostGIS support - IIRC it has been on their roadmap for a while but keeps getting kicked around every year or so. Same thing with CockroachDB - PostGIS support kicked down the road every year.
Hi there, CockroachDB dev here. We've supported spatial features from PostGIS since 20.2 - you can get started here https://www.cockroachlabs.com/docs/stable/spatial-data.html! Of course, there's bits and pieces we've missed but if there's something you're missing in particular you can let me know here or through a GitHub issue.
Agreed. At a previous work, clickhouse outperformed timescale by several orders of magnitude, under about every condition.
The timescale team seems to recognize that (look for the comment about clickhouse being a bulldozer) but they seem to say timescale can be better suited.
In my experience, in about 1% of the cases, yes, timescale will be a better choice (ex: if you do very small batches of insertions, if you need to remove some datapoints) but in 99% of the usecases for a time series database, clickhouse is the right answer.
There seems to have been several improvements to timescale since 2018, with columnar storage, compression, etc. and that's good because more competition is always better.
But in 2021, clickhouse vs timescale for a timeseries is like postgres vs mongo for a regular database: unless you have special constraints [*], the "cool" solution (timescale or mongo) is the wrong one.
[*]: you may think you have a unique problem and you need unique features, but odds are, YAGNI
> In my experience, in about 1% of the cases, yes, timescale will be a better choice (ex: if you do very small batches of insertions, if you need to remove some datapoints) but in 99% of the usecases for a time series database, clickhouse is the right answer.
I always find comments like this interesting :-). Things are better for different use cases. If you find yourself inserting a lot of data in batches for OLAP-style analysis, then ClickHouse is a better choice today.
If you find yourself performing a lot of time-series related queries, and needing to build an application on top (e.g., where you might want the OLTP features of Postgres), then Timescale is the better choice.
YMMV! And that's OK :-)
> But in 2021, clickhouse vs timescale for a timeseries is like postgres vs mongo for a regular database: unless you have special constraints [*], the "cool" solution (timescale or mongo) is the wrong one.
This is also a funny statement, because TimescaleDB is built on PostgreSQL.
We actually take great pride in being a "boring" option [0] - in fact I think TimescaleDB is many ways is more "boring" than ClickHouse (again, because of its PostgreSQL foundation). But I think that's actually a good thing - because you should want your database to be "boring" - ie you shouldn't have to worry about it!
> This is also a funny statement, because TimescaleDB is built on PostgreSQL.
I know, but doing timeseries with postgres is "cool", not standard, not boring. I'd even say "risky".
> We actually take great pride in being a "boring" option
No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used (even if you are doing a lot of outreach like these posts)
> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used
This is not true at all, and we explain why in the post:
1. TimescaleDB's reliability is PostgreSQL's reliability. ClickHouse has a lot of advantages, but "more reliable than PostgreSQL" is not one of them.
From the post:
PostgreSQL has the benefit for 20+ years of development and usage, which has resulted in not just a reliable database, but also a broad spectrum of rigorously tested tools: streaming replication for high availability and read-only replicas, pg_dump and pg_recovery for full database snapshots, pg_basebackup and log shipping / streaming for incremental backups and arbitrary point-in-time recovery, pgBackrest or WAL-E for continuous archiving to cloud storage, and robust COPY FROM and COPY TO tools for quickly importing/exporting data with a variety of formats. This enables PostgreSQL to offer a greater “peace of mind” - because all of the skeletons in the closet have already been found (and addressed).
2. ClickHouse, being a newer database, still has several "gotchas" with reliability: e.g., No data consistency in backups (because of its lack of support for transactions and asynchronous data modification)
From the post:
One last aspect to consider as part of the ClickHouse architecture and its lack of support for transactions is that there is no data consistency in backups. As we've already shown, all data modification (even sharding across a cluster) is asynchronous, therefore the only way to ensure a consistent backup would be to stop all writes to the database and then make a backup. Data recovery struggles with the same limitation.
The lack of transactions and data consistency also affects other features like materialized views because the server can't atomically update multiple tables at once. If something breaks during a multi-part insert to a table with materialized views, the end result is an inconsistent state of your data.
Now this trade-off - accepting less reliability for faster OLAP queries - may be fine with you. And that's OK. But stating that ClickHouse is more reliable than PostgreSQL/TimescaleDB is just not true.
You are right about transactional differences between ClickHouse and PostgreSQL but you are comparing apples and oranges. ClickHouse prioritizes speed, efficiency, and scale over consistency. These are reasonable choices, especially in the largely append-only use cases which dominate analytics.
1. I've seen relatively few messed up source tables and mat views over thousands of support cases. When they happen they can be bad for some use cases like financial analytics. They simply aren't very common. And for use cases like observability or log management it just doesn't matter to have a few lost or duplicated blocks over huge datasets.
2. ClickHouse overall is eventually consistent. There are generally differences between replicas when load is active, yet it causes relatively few practical problems in most applications as they load balance queries over replicas. Serialization is expensive and simply not very highly valued here.
3. ClickHouse uses other mechanisms than ACID transactions to ensure consistency. One good example is discarding duplicate blocks on insert into replicated tables. If there's any doubt whether an insert succeeded, you can just insert the block again. ClickHouse checks the hash and discards it. This is incredibly efficient and works without requiring expensive referential integrity (e.g., unique indexes).
4. It's just about always possible to get ClickHouse to boot even when you have corrupt underlying data (e.g., due to file system problems). I don't know how you define reliability but at least in this sense ClickHouse is extremely robust. I've never seen a server fail to start, though you might need a bit of surgery beforehand.
5. ClickHouse doesn't have transactional DDL. What it does have is features like altering tables to add new columns in a fraction of a second without locking regardless of the size of the dataset. Its behavior is close to NoSQL in this regard.
I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.
I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.
I agree with this. You are poking at a straw man.
My reply was in response to this comment by the OP:
> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used
TimescaleDB - which some don't realize - inherits all of the reliability of PostgreSQL, i.e., the 20+ years of usage and tuning (and broad tooling ecosystem).
What I was disproving was the statement that TimescaleDB/PostgreSQL was somehow riskier than ClickHouse.
ClickHouse is impressive, but deployments are still far behind that of PostgreSQL. ClickHouse is also younger and less mature than PostgreSQL.
I can see that you are the CEO of Altinity. Nice to meet you. I'm the CEO of Timescale. I think it's important that we strive for transparency in our industry, which includes admitting our own product's shortcomings, and to accept valid criticism.
We've done that many times in this HN thread (and in the blog post). I think we would have had a more productive discussion in this HN thread if ClickHouse developers were also as transparent with ClickHouse's shortcomings.
I'm happy to continue this conversation offline if you'd like. The database market is large, the journey is long, and in many ways companies like ours are fellow travelers. ajay (at) timescale (dot) com
Hi Ajay! Thanks for the thoughtful response and email. I would love a direct meeting and will contact you shortly.
I don't mean to gloss over ClickHouse imperfections. There are lots of them. For my money the biggest is that it still takes way too much expertise in ClickHouse for ordinary developers to use it effectively. Part of that is SQL compatibility, part of it is lack of tools of which simple backup is certainly one. To the extent that ClickHouse is risky, the risk is finding (and retaining) staff who can use it properly. Our business at Altinity exists in large part because of this risk, so I know it's real.
The big aha! experience for me has been that the things like lack of ACID transactions or weak backup mechanisms are not necessarily the biggest issues for most ClickHouse users. I came to ClickHouse from a long background in RDBMS and transactional replication. Things that would be game ending in that environment are not in analytic systems.
What's more interesting (mind-expanding even) is that techniques like deduplication of inserted blocks and async multi-master replication turn out to be just as important as ACID & backups to achieve reliable systems. Furthermore, services like Kafka that allow you to have DC-level logs are an essential part of building analytic applications that are reliable and performant at scale. We're learning about these mechanisms in the same way that IBM and others developed ACID transaction ideas in the 1970s--by solving problems in real systems. It's really fun to be part of it.
My comment didn't convey this clearly, for which I heartily apologize. I certainly don't intend to portray ClickHouse as perfect and still less to bash Timescale. I don't know enough about the latter to make any criticism worth reading.
p.s., Non-transactional insert (specifically non-atomicity across blocks and tables) is an undisputed problem. It's being fixed in https://github.com/ClickHouse/ClickHouse/issues/22086. Altinity and others are working on backups. Backup comes up in my job just about every day.
We were using tags, so that "else" block isn't the one being used for ClickHouse. Regardless, the table that is created (by the community and verified by former CH engineers) orders by created_at, not time and so that query should be the "fastest" the distinct possible.
I'm not sure why you think that's creative engineering. What you're pointing to is the depth of available configuration that the contributors to TSBS have exposed for each database. It's totally open source and anyone is welcome to add more configuration and options! I believe (although not totally sure) that Altinity and ClickHouse folks added their code a few years ago - at least it wasn't anyone on the Timescale team.
That said, we didn't actually use those scripts to run our tests. Please join us next Wednesday (10AM ET/4PM CET) to see how we set the databases up and ran the benchmarks. We'd be delighted to have you try it on your own too!
All the same tests. You simply pointed to a shell script that's configurable to run tests for each database. We provided details in the blog post of exactly what settings we used for each database (cardinality, batch size, time range, TimescaleDB chunk size, etc.) so you can use those script to configure and run the tests too.
Can someone give me a real-world example of a scenario where they actually need a time series database, like an example query with the business use case / justification? Just super curious.
We're using TimescaleDB to store log events - so each row has a timestamp, some properties, and a log message. And a lot of that is actually in a JSONB column.
Not the archetypal time series use case, but TimescaleDB is still really useful.
TSDB's compression means we can store a huge volume of data in a fraction of the space of a standard Postgres table. You can achieve even better compression ratios and performance if you spend time designing your schema carefully, but honestly we didn't see the need, as just throwing data in gets us something like 10:1 compression and great performance.
TSDB's chunked storage engine means that queries along chunking dimensions (e.g. timestamp) are super-fast, as it knows exactly which files to read.
Chunking also means that data retention policies execute nearly instantaneously, as it's literally just deleting files from disk, rather than deleting rows one-by-one - millions of rows are gone in an instant!
And best of all, this all works in Postgres, and we can query TSDB data just the same as regular data.
All that combined easily justified the decision to use TSDB - and if you're familiar with Postgres, it's actually really simple to get started with. Really, we'd of needed a business justification not to use it!
I introduced Timescale at an IIOT company, we had thousands of sensors regularly sending data up and wanted to efficiently display such metrics to users on a per-sensor basis, and one "tick" of data had a lot of metrics. Timescale let us go from Postgres for metadata and OpenTSDB (awful, stay far away) for time series to just one Timescale instance for everything. Huge win for us. We had enough data that doing the same with vanilla Postgres would have performed much worse (billions of rows).
We just migrated to Clickhouse. We collect monitoring data. So response times from 20+ different location. We are not super duper big but at least 100M+ individual metrics per month. We want to give our users a snappy, interactive dashboard that lets them explore aggregates of that data over time: averages, p99 etc..
I use TimescaleDB to store data from logs (syslog, http logs, render logs), timeseries host metrics (CPU, Disk, Memory, Network), system inventory data, FDW to LDAP, network flows, etc. I do analysis across multiple dimensions for a variety of business uses.
1. Render job logs, combined with host metrics to determine if any systems are being under utilized.
2. Intranet network usage for monitoring purposes.
3. General monitoring of log events, system metrics, etc with alerting based on specific query criteria.
4. Temperature / Humidity monitoring and trend analysis.
Basically if there are things that happen somewhat often (http logs, system logs, host metrics, power usage, temperature change, etc) and you want to analyze that data, and one of the factors in your analysis is time, then time series databases are for you.
In the end each TSDB has different trade offs, and there are a number of considerations to make, and how you weigh them will change a lot depending on scale. If you are writing 1-20GB/Day vs 1-20TB/Day, you will have different considerations. If you have relatively simple queries vs complex queries with complex joins on other data you will have different considerations. In the end you need to have some idea what your data is, how much of it there is, what the growth looks like and what your write/query usage patterns are going to look like. Insert / Query performance isn't everything, unless for your business it is.
Most useful economic or financial markets data is some kind of time series. For example, what was the maximum price of a stock in the last year, and how does it compare to the current price?
At Netflix all of our monitoring was in a time series database so we could get real time insights into pretty much anything we were monitoring (which was most everything).
I build TSDB (10,000,000 samples per seconds and average query time is almost 200ms) over clickhouse which support multi insert protocol (influxdb line protocol|prometheus remote write|opentsdb json format) and opentsdb query protocol and incomplete PromQL. Clickhouse is good but something you need to tunning. using memory table to speedup insert,high performance zookeeper to improve replicated table, data shard via timeseries id (distributed_group_by_no_merge=1) to reduce bootstrap node load
We capture and store energy readings at ~5second intervals, then display total energy at various time granularity by aggregating the values over minutes, hours, days, months, etc.
We've been working really hard on our launches / releases this month! We called it "Always Be Launching" - we've been aiming for releasing multiple things per week during October :)
However, as a DB where users may store critical data, should you really be "Always be launching"? That sounds a little like FB's "move fast and break things". There's a reason why some of the mission critical open source technologies move slowly.
We actually are only having one database software release this month (TimescaleDB v2.5), which is aligned with our normal database release cadence.
Timescale (the company) also provides a managed cloud offering, as well as Promscale (an observability product built on top of TimescaleDB).
So #AlwaysBeLaunching is a company-wide effort across different product & engineering teams, as well as folks in Developer Advocacy and others (e.g., who worked on this comparison benchmarks).
What might be also interesting is our introduction of Experimental Schema features in TimescaleDB - explicitly so that we can "Move fast, but don't break things" (which is also key to getting good community feedback):
Timescale team member here. We take our responsibility to build a rock-solid platform very seriously. We have multiple "levels" of product within Timescale. At our core, we have the open-source database, TimescaleDB. This product releases on a more deliberate and careful cadence, always making sure that we are optimizing for reliability, security, and performance. This has been our approach since our initial launch [0], where we embraced the mantra "boring is awesome", recognizing that for our users stability and reliability is of paramount importance.
Within the core database, we offer features that are carefully marked as "experimental", which we discuss at length in this blog post [1].
Beyond TimescaleDB, we also offer other products that are more SaaS-y in nature. While they're all based on the rock-solid foundation of TimescaleDB, we are also able to ship new features more quickly because they are UI components that make using the database even easier.
Finally, some of our "launches" are more textual in nature, such as this benchmark, which we have spent months researching and compiling.
We've got a few billion rows in TSDB, pretty happy with it so far. Our workload fits the OLTP workflow more than OLAP though, we're processing / analyzing individual data points from IoT devices as they come in, and then providing various visualizations. This tends to mean that we're doing lots of fetches to relatively small subsets of the data at a time, vs trying to compute summaries of large subsets.
Compression is seriously impressive, we see ~90% compression rate on our real world datasets. Having that data right next to our regular postgres tables and being able to operate on it all transactionally definitely simplifies our application logic.
Where I see a lot of folks run into issues with TimescaleDB is that it does require that your related data models hold on to relevant timestamps. If you want to query a hypertable efficiently, you always want to be able to specify the relevant time range so that it can ignore irrelevant chunks. This may mean that you need to put data_starts_at, data_ends_at columns on various other tables in your database to make sure you always know where you find your data. This is actually just fine though, because it also means you have an easy record of those min / max values on hand and don't need to hit the hypertable at all just to go "When did I last get data for this device".
In practice we've seen it actually improve performance, because when fetching a data range for a device fewer actual rows have to be fetched from the disk. You pick certain columns (like device ID) that remain uncompressed and indexed for rapid querying, and then the actual value columns are compressed for a range of time.
> This may mean that you need to put data_starts_at, data_ends_at columns on various other tables in your database to make sure you always know where you find your data.
Do you have a link to docs for this ? Does this mean literally put a first column named (xstartx) and an end column (xendx) as the last column ? How do you then utilize it ?
Can somebody recommend a database suitable for an event sourced application:
* One series of events per user
* Each series grows at about 10 events/minute while the user is active
* Fancy queries are not required, typically a user's event series is consumed in order to update aggregate state for that user
* Either used online, adding events one at a time and needing to immediately update the aggregate state
* Used offline syncing a batch of hours or days at once. When a large time interval, eventually consistent state updates are acceptable
* It must be possible to delete a user's data, regardless how old it is (a nightly batch job deleting multiple users at once is fine, if it helps performance)
* Migrating old data should be possible with reasonable performance and without consuming excessive temporary memory
* Compact storage is important (simple zstd compression should suffice, though columnar compression might be slightly better)
* Being able to use a cheaper object store like S3 for old data would be nice
At a glance timescale community appears to meet most requirements. The userid can be used as `segmentby` key, and the data compressed via columnar compression. But it seems to have limitations with migration (sounds like it requires me to manually decompress and recompress chunks, instead of simply transforming one (chunk, segment) piece at a time) and deletion (I need to delete everything with a specific `segmentby` key).
Alternatively there is the DIY approach, of serializing each entry in a compact format, one file per user, and then once data is old enough compress it (e.g. with zstd) and upload it to S3.
Looks like you edited this with some more detail, so I'll answer higher.
Compression in TimescaleDB used to mean all compressed data was immutable and the table schema couldn't be altered. Since TimescaleDB 2.1, and 2.3 that has changed in a few ways.
- Schema can now have columns added or renamed
- Compressed chunks can now have rows inserted into them (partially compressed, the background job will fully compress when it runs)
Row deletion is not possible yet, but I've personally been having some internal conversations around ways to do exactly as you're suggesting in the near-term; deleting rows based on a "segmentby" column. I have some testing to do... but my focus has been taken up by a certain, 33-minute long read, blog post.
Feel free to join our Slack and DM me if you want to talk about it further.
Updating many/all rows as a rare maintenance task, typically as part of deploying a new version of the application.
I know timescale has native support for the most common cases (adding nullable columns/renaming columns). But sometimes the transformation is more complex. Sometimes an sql update statement would suffice, sometimes streaming each segment in chronological order to the application returning the updated row might be required.
I don't think performance is the issue, but I'd like to keep the storage small/cheap via compression, since this is a hobby project. (though admittedly it's probably premature optimization)
I think the name originally came from company/project origin and how the whole thing started... Recently set up as their own entity, ClickHouse came from Yandex. There are probably others better able to give that history, but that's the gist of it.
I was surprised to see that ClickHouse and ElasticSearch have the same number of contributors. That's pretty astounding given how much older and more prominent ElasticSearch has been.
Edit: I was very off. The Github contributor graph does not show all actual contributors. ElasticSearch has somewhere around 2-3 times as many contributors as ClickHouse.
Clickhouse is by far the leading open source columnar SQL data warehouse at this point. We have had strong open source operational SQL DBs for many years (MySQL, Postgres), but no open source systems that mirrored closed source MPP columnstore until clickhouse. Its interesting that it took "this long" for a strong open source SQL DW to emerge.
ClickHouse has gained a huge following and honestly that's been pretty well earned. For the kinds of apps that they target it's a great choice, it's great technology with a very able team behind it.
As per the landing pages of the projects, ES has 1.6k contributors whereas ClickHouse has 803. The contributors page likely only lists the top contributors to keep the page load time manageable.
Go to one of the several public ClickHouse endpoints and run this query:
-- Elastic vs CH in a single table.
SELECT toYear(created_at) Year,
uniqIf(creator_user_login, repo_name in ('elastic/elasticsearch')) "Elastic",
uniqIf(creator_user_login, repo_name in ('yandex/ClickHouse', 'ClickHouse/ClickHouse')) "CLickHouse"
FROM github_events
WHERE event_type = 'PullRequestEvent'
AND merged = 1
AND repo_name in ('yandex/ClickHouse', 'ClickHouse/ClickHouse', 'elastic/elasticsearch')
GROUP BY Year ORDER BY Year
You can access the ClickHouse web UI for this dataset here: https://github.demo.trial.altinity.cloud:8443/play?user=demo. The password is "demo" (type it in the left side.) This is the Altinity.Cloud copy of Alexey Milovidov's excellent github_events dataset.
When I run this query I get the following numbers.
Related to TimescaleDB, there was a blog post which explained their internals and also compared with another similar time series DB. I can't seem to find the link, anyone remembers?
Would either of these database systems be proper for a case where you have a mix of large measurement data and small reference/master data that you need to join, filter, etc. ?Example:
SELECT r.country, m.time, SUM(m.measurement)
FROM measurement_table AS m
INNER JOIN refence_table AS r
ON m.device_id = r.device_id
In it's current form/state, ClickHouse is not optimized for typical JOIN-type queries, a point we make in the post. You would have to re-write your statement to get better performance. The other main point is that all data is "immutable", so if your reference data needs to be updated, it would still need to go through some kind of asynchronous transform process to ensure you're getting the correct values at query time.
TimescaleDB is PostgreSQL, so it can easily handle this kind of join aggregate like you would expect. If "m.measurement" was compressed, historical queries with a time predicate would likely be faster than uncompressed state.
1) Clickhouse allows to trivially setup a batch buffer on server side. It will always be faster on insert with low disk usage.
2) Suspicious axis on query benchmark. It seems it represents diff in query latencies. So if one executes query in 1.00 and another in 1.34 on graph it will be whopping 34.
Last time I looked at TimescaleDB a few months ago, it appeared to me that you had to ingest data into what they call an access node, which made it a non-starter for our use case, but matched perfectly with ClickHouse, which is a symmetric deployment, where every node can be an ingest node.
You are correct. The current multi-node deployment (if you need it for your workload) does have one "managing" access node. All ANs and DNs can be replicated and configured with tooling such as patroni for HA.
We have a few users that have larger multi-node clusters setup this way (one at 40+ nodes so far) and happy with the offering overall. Obviously YMMV depending on requirements/IT support.
This is generally true for most databases these days. Use an operator if it's available. Helm can't handle the dynamic management required to run databases properly.
Could putting RedPanda/Kafka in front of ClickHouse make it insert benchmark winner? Of course it means operational expenses but I wonder if this route is worth exploring?
What is the difference w.r.t the comparison done by Altinity of clickhouse with timescale ? Clickhouse performed better there for the same test. What gives ?
Thank you. My only nit is the way the ratio (CH/TS) is shown. What is the purpose of that ? It will show a bigger percentage for cases in which TS is better, but lower percentage for cases where CH is giving better results. From the data representation perspective, I do not thinnk that is fair.
The two big things, which we discuss at length in the post, are:
- Altinity (and others) did not enable compression in TimescaleDB (which converts data into columnar storage) and provides improvement in querying historical data because it can retrieve individual columns in compressed format similar to CH
- They didn't explore different batch sizes to help understand how each database is impacted at various batch sizes.
Have you from your side followed all Clickhouse best practices?
Clickhouse design in particular suggests doing ingest request approximately once per second and if you do much more than that when you use it outside of intended usage and if you need that you usually have some sort of queue between whatever produces the data and Clickhouse.
Note the ingest in small batches also can significantly affect query performance
Yep - it's all detailed in the post! The question is how it compares to TimescaleDB, which is an OLTP time-series database that has a lot of other possible use cases (and extensibility). I think it's very fair to explore how smaller batches work since others haven't ever actually shown that (as far as we can see) so that users that would normally be coming from a database like PostgreSQL can understand the impact something like small batches would have.
As for ingest queueing, TSBS does not queue results. We agree, and tell most users that they should queue and batch insert in larger numbers. Not every app is designed that way and so we wanted to understand what that would look like.
But CH did amazingly well regardless of that with batches above 1k-2k and lived up to it's name as a really fast database for ingest!
Howdy! All of the details about our TSBS settings in the performance section of the docs. Also, we'll be streaming a sample benchmark of the two databases next Wednesday at 10AM ET/4PM CET.
> We tried multiple batch sizes and found that in most cases there was little difference in overall insert efficiency
This is wrong in CH world where batch size matters a lot. I would recommend keep this even more higher around (10x of current value).
Humble Suggestion: There are many things not quite properly interpreted about CH and reading through the blog it seems like you're focusing more on areas which CH is lacking/missing. Please don't do these things.
- The code that TSBS uses was contributed by Altinity[1]. If there is a better setup, please feel free to submit a PR. As stated elsewhere, we did have a former CH engineer review and even updated ClickHouse to the newest version __yesterday__ based on his suggestion to ensure we had the best numbers. (and some queries did improve after upgrading, which are the numbers we presented)
- It seems like you read the article (great job - it was long!!), so I'm sure you understand that we were trying to answer performance and feature questions at a deeper level than almost any benchmark we've seen to date. Many just show a few graphs and walk away. We fully acknowledged that smaller batches are not recommended by CH, but something many (normally OLTP) users would probably have. It matters and nobody (that we know of) has shown those numbers before. And in our test, larger batch sizes do work well, but not to some great magnitude in this one server setup. Did 10k or 20k rows maybe go a little faster for CH? Sometimes yes, sometimes negligible. The illustration was that we literally spent months and hundreds of benchmark cycles trying to understand the nuances.
I think we're pretty clear in the post that CH is a great database for the intended cases, but it has shortcomings just like TimescaleDB does and we tried to faithfully explore each side.
This is what tend to make all vendor benchmarks "benchmarketing" - while many of us fully intend to give a fair shot to other technologies we tend to know best practices for our own software better than "competition"
We used the Time Series Benchmark Suite for all these tests https://github.com/timescale/tsbs. Also, Ryan (post author) will be giving all the config details in a Twitch stream happening next Wednesday. We'll be uploading the video to Youtube immediately afterwards too >>
Apache Druid and Apache Pinot are two others to consider for time series. We're using druid at scale and it works pretty well. Pinot appears to be faster for retrieval but it is less mature.
Having used both TSDB and ClickHouse in anger I have some thoughts on this:
They are both fantastic engines, I really like that both have made very specific tradeoffs and can be very clear in what they are good and bad at. Having worked on database engines, I can appreciate the complexity that they are solving.
My most recent use is with ClickHouse, which is great and I think a complete game-changer for the company. However there's a lot of issues (that are being worked on, the core team is great, though there are a few personalities that are a bit frosty to deal with). All of these comments come with love for the system.
1. Joins really need some work, both in the kinds of algorithms (pk aware, merge joins that don't do a full sort etc.), and in query optimizer work to make them better. We have analysts that use our system, and telling them to constantly write subqueries for simple joins is a total PITA. Not having PK aware joins is a massive blocker for higher utilization at our company, which really loves CH otherwise.
2. Some personalities will tell you that not having a query optimizer is a feature, and from an operational standpoint, it is nice to know that a query plan won't change, or try and force the optimizer to do the right thing. However, given #1, making joins performant (we have one huge table with trillions of rows, and a few smaller ones with billions) is really rough.
3. The operations story really needs some work, especially the distribution model. The model of local tables with a distributed table over it is difficult to work with personally. It would be nice to just be able to plug servers in without alot of work, like Scylla, and not have two tables that you have to keep schemas consistent with. THere's also just some odd behavior, like if you insert async into a distributed table, and only have a few shards, it'll only use a thread per shard to move that data over. It would be nice if there wasn't as much to think about.
4. Following #3, there's just too many knobs, maybe if they had a tuning tool or something that would help, but configuring thread pools is difficult to get right. I suspect CH could use a dedicated scheduler like Scylla's, that could dispatch the work, instead of relying on the OS.
5. The storage system relies a lot on the underlying FS and settings on when to fsync etc. I suspect if they had a more dedicated storage engine (controlled by the scheduler above), things could be more reliable. I still don't fully trust data being safe with CH.
6. Deduplication - This is a hard problem, but one that is really difficult to solve in CH. We solve it by having our inserters coordinate so that they always produce identical blocks, using replacing merge trees to catch stragglers (maybe), but it isn't perfect. A suggestion if possible is to try and put the same keys into the same parts, so they'll always get merged out by the replacing merge tree (I understand this is difficult).
The CH team is great, and these will be fixed in time, but these were the problems we ran into with CH.
TSDB was really solid, but we never used it at a scale where it would tip over. Our use case is really aligned with Yandex's so a lot of the functionality they have built is useful to us in a way that TSDB's isn't. (Also, being able to page data to S3 is amazing).
Thanks for your excellent contribution to this discussion. As the post author I wholly agree with your approach: if a solution hits the sweet spot for you in the context of your requirements that's the one you choose. Thank you for considering TimescaleDB alongside ClickHouse in what was obviously a well thought through assessment of these two excellent technologies.
Thanks for the great, thoughtful feedback. We (Timescale) couldn't agree more that there is a lot to love about ClickHouse, especially where it truly excels.
Information like this is helpful for others currently in the "choose the right tool" part of the job and to the developers of the product. I can't imagine how different all of our offerings will look in a few more years! :-)
If you have thousands of clients writing to the database individual rows, one per request, and thousands of clients making queries (some of them are complex, some are not). Does ClickHouse even get used in this scenario?
I’m surprised Timescale hasn’t given a comparison with SingleStoreDB.
I’ve found SingleStore column scans at parity with ClickHouse in speed. At same time SingleStore uses a hybrid skip-list, columnstore data structure in their universal storage (which is default table format).
So you have high throughput transactions, as well as insanely fast aggregate scans.
Usually in column stores, they are great at append, not so much with updates and deletes.
2. Restrictions. You acknowledge that the Software, and its structure, organization, and source code, constitute SingleStore’s and its suppliers’ valuable trade secrets, and the Software is subject to the following restrictions. Specifically, Customer shall not, and shall not
...
conduct any competitive analysis, publish or share with any third party any results any results of any technical evaluation or benchmark tests performed on the Software, or disclose Software features, errors or bugs to a third party without SingleStore’s prior written consent (“Benchmarking”); or
ClickHouse has a bit more ops overhead, but requires very little in the way of pre-planning. Just throw whatever you want at it and it seems to sing by default.
Regarding ops overhead, ClickHouse also has a "local" mode where you can query a huge range of compressed formats without ever performing any kind of ETL step. That means queries can run e.g. directly against S3 storage. For batch logs analysis, IMHO local is a gamechanger. Most bulk logging systems produce massive amounts of S3 objects, and ClickHouse lets you tear through these infrequently (but at high speed) when desired without any bulky/opsey ETL step, and no cluster running idle just waiting for a handful of daily jobs to arrive.
(I love this style of system in general, but the clear engineering work behind ClickHouse completely won me over, I'd apply it anywhere I could)