Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your article does not mention how much runtime improvement you have observed, can you share those numbers ?


With the 2-pass strategy, we can write arbitrary row group sizes while using a fixed amount of memory, with probably 100-200 MiB of overhead for the parquet file processing, depending on how large the metadata is for the scratch file. without the 2 pass strategy, the amount of memory is proportional to the size of the row group.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: