Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you write to a file, you generally don't write to physical storage. Instead the writes get buffered in memory and written to physical storage in batches. This substantially improves performance but creates a risk: If there is some sort of outage before the data is flushed to disk, you might lose data.

In order to address that risk, you can explicitly force data to be written to disk by calling fsync. Databases generally do this to ensure durability and only signal success after fsync succeeded and the data is safely stored.

So ClickHouse not calling fsync implies that it might lose data in case of a power outage or a similar event.



Most ClickHouse installations run replication for availability and read scaling. If you do get corrupted data for some reason, you can read it back from another replica. That's much more efficient than trying to fsync transactions, especially on HDD. The performance penalty for fsyncs can be substantial and most users seem to be pleased with the trade-off to get more speed.

This would obviously be a poor trade-off for handling financial transactions or storing complex objects that depend on referential integrity to function correctly. But people don't use ClickHouse to solve those problems. It's mostly append-only datasets for analytic applications.


is this really that important, thought, since all servers feed power from uninterruptible power supply and most data centers have multiple power sources.


It’s a significant deviation from what I would expect from a disk oriented database. So I would definitely expect it to be well documented, along with the reason for it, why the developers believe it is a reasonable (or even safe) choice, what assumptions went into that (such as availability of uninterruptible power supply) etc.

Additionally keep in mind that with EBS most people probably use network attached storage and fsync involves the network. Outage doesn’t just mean power outage, it could also be a network issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: