Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The memory expectations for so many programmers going into pandas baffle me. In particular, noone was batting an eye that a 700 meg CSV file would take gigs to hold in memory. Just convincing them to specify the dtypes and to use categorical where appropriate has had over 70% reductions on much of our memory requirements. Not shockingly, they go faster, too.

If there are efforts to help this be even better, I heartily welcome them.



When I'm teaching Pandas, the first thing we do after loading the data is inspect the types. Especially if the data is coming from a CSV. A few tricks can save 90+% of the memory usage for categorical data.

This should be a step in the right direction, but it will probably still require manually specifying types for CSVs.


Yeah, I expect most efforts to just help make the pain not as painful. And specifying the data types is not some impossible task and can also help with other things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: