Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not the OP, but moving to sparse matrices is probably going to give you the most bang for your buck. I would strongly suspect that those huge dataframes could be encoded sparsely in a much more efficient format.

To be fair, that's one of the reasons that Spark ML stuff works quite well. Be warned though, estimating how long a Spark job will take/how much resources it will need is a dark, dark art.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: