This came up in a Twitter discussion, too. Contains a link to an interesting paper which I copied below.
> As someone who work with C++ parquet readers and writers I say that different configurations can easily result in 10x differences in size -- the default behavior is usually very poor (Tony Wang / @marsupialtail_2) https://twitter.com/marsupialtail_2/status/17021850155038883...
> As someone who work with C++ parquet readers and writers I say that different configurations can easily result in 10x differences in size -- the default behavior is usually very poor (Tony Wang / @marsupialtail_2) https://twitter.com/marsupialtail_2/status/17021850155038883...
> TUM have written a great summary about performance niches around parquet file IO performance, @DatabendLabs adopted a lot of optimizations mentioned in the summary and result is great in practice. https://dl.gi.de/server/api/core/bitstreams/9c8435ee-d478-4b... (zhihanz / @zhihanz1205) https://twitter.com/zhihanz1205/status/1702196118472536166