Because it front loaded all operations so that they happen outside of the benchmark. Depending on what you want to do it makes sense but the original intention of the benchmark was a brute force query benchmark.
Have you seen the sticker on the NUC? 116 billion rows per second at 233.61GB/s. If you spend even a single second thinking about how absurd that number is you would start to see that the two benchmarks measure completely different things. Even with a quad channel Xeon CPU you won't see significantly more than 100GB/s memory bandwidth. Those 116 billion queries didn't actually happen. It's just a synthetic number. The result of the query was calculated during insertion of the temperature record before the benchmark has even started and then they just calculated the theoretical number of queries you would have to do for an equivalent result and slapped that fictional number on their NUC.
That's a sticker from a ClickHouse community event, not related to the benchmark. We tend to stick them on anything flat. My ancient Dell XPS-13 has one. It's definitely not that fast.
That said, the sticker is from a real performance test. I assume it was a cluster but don't have details. ClickHouse query performance is outstanding--it's not hard to scan billions of rows per second on relatively modest hosts. These are brute force queries on source data, no optimization using materialized views or indexes.
For instance, I have an Amazon md5.2xlarge with 8 vcpus, 32 GB of RAM, and EBS GP2 storage rated at 100 iops. I can compute average passengers on the benchmark NYC taxi cab dataset [1] in .551 seconds using direct I/O. The throughput is 2.37B rows/sec.
ClickHouse is so fast on raw scans that many production users don't even use materialized views. I mostly use them to get responses down to small numbers of milliseconds for demos.
Have you seen the sticker on the NUC? 116 billion rows per second at 233.61GB/s. If you spend even a single second thinking about how absurd that number is you would start to see that the two benchmarks measure completely different things. Even with a quad channel Xeon CPU you won't see significantly more than 100GB/s memory bandwidth. Those 116 billion queries didn't actually happen. It's just a synthetic number. The result of the query was calculated during insertion of the temperature record before the benchmark has even started and then they just calculated the theoretical number of queries you would have to do for an equivalent result and slapped that fictional number on their NUC.