With TimescaleDB compression, 1000 rows of uncompressed data are compressed into...

nhoughto · on Oct 22, 2021

yeah very interesting, i was wondering how timescale pushed postgres more towards columnar without rewriting a bunch of postgres itself.

My understanding of TOAST is that it itself is just a bunch of rows in a toast table that split the compressed "row" or in this case "1000 rows of 1 column" across as many rows as required to store the data whilst remaining within the postgres page size limits (normally 8kb).

With the often quoted postgres per row overhead of 23 bytes~ which you would have to pay for each TOAST row as well, does this not add up and eat into your storage efficiencies? or does compression work so well that the 23 bytes x N rows (1 row pointing to toast + N toast rows) required to store the "row" isn't important?

mfreed · on Oct 22, 2021

The compressed column segment is stored in a single row in TOAST.

More info: https://blog.timescale.com/blog/building-columnar-compressio...

nhoughto · on Oct 22, 2021

Does timescale do it’s own compression alg too? I see in pg 14 toast column compression can be lz4 instead of ootb pglz which has a few probs appr, I see mentions on the mailing list of significant possible optimizations. When dealing with EBS style storage where read latencies can be multi millis compression is always going to be a win, but is an easy optimization either way I’d think.

mfreed · on Oct 22, 2021

Timescale implements its own compression algorithms. It includes several ones, and automatically applies the choice of algorithm based on the data types of columns.

- Gorilla compression for floats

- Delta-of-delta + Simple-8b with run-length encoding compression for timestamps and other integer-like types

- Whole-row dictionary compression for columns with a few repeating values (+ LZ compression on top)

- LZ-based array compression for all other types

This means within even the same table, different columns will be compressed using different algorithms based on their type (or inferred entropy).

More information for those interests:

- General TimescaleDB compression post: https://blog.timescale.com/blog/building-columnar-compressio...

- Deep dive on compression algorithms it employs: https://blog.timescale.com/blog/time-series-compression-algo...

nhoughto · on Oct 22, 2021

Ah so only costs 1 row for pointer and 1 row for toast? Well that’s much more deterministic