Big Data Storage and Transfer Formats

scrappyjoe · on Oct 22, 2022

You don’t need to shut down the ‘generating process’ to read parquet. With arrow, which is available in many languages, you have the notion of an arrow dataset, which is a directory of arrow files.

Just keep adding more files to the directory, and your dataset will grow.

Duckdb is able to connect to and read the dataset, so you’re all good.

adammarples · on Oct 22, 2022

Pretty sure duckdb can read and write parquet, if you were loving duckdb but worried about parquet accessibility

anotherhue · on Oct 22, 2022

Play “Big Data or Pokémon?” To learn more.

https://pixelastic.github.io/pokemonorbigdata/