Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know about _eye opening_, but The Art of PostgreSQL really changed the way I work for the better. Like a lot of people, I used to be one that would pull all my data into Python for processing, Pandas-style. Once I learned how to do it all directly in PG everything became trivial.

[0] https://theartofpostgresql.com/



Do you have some examples of that ?

I always found that SQL is better for data extraction -- selecting and filtering approximately what I want to analyze.

But for in-depth analysis, R or Python is better. They're just much better languages with more control than SQL.


What if you need all columns of a few rows, but the rows chosen as the ones satisfying some condition on a few columns across all the rows?

Maybe you could select pk,a,b,c; identify the ones you want in Python; select * where pk in those; and finish doing whatever in Python - but now you've got two round trips to the database.

Obviously that probably doesn't matter if you're doing some 'offline' analytics, but if you're serving a request in a web app it's going to be a lot slower than if you can push that logic into the db, and make a single query.


Even just learning how to use CTEs effectively has been a major improvement for my daily work. Building up a dataset iteratively in clearly defined steps and performing aggregations, transformations, orderings, etc along the way is just a very clean way to work. And it's easy to short-circuit at any step for debugging if you're not getting the results you expect. I often work with datasets larger than the amount of memory on my computer so has benefits there as well instead of trying to load it all into Python memory to work with it.

[0] https://www.postgresql.org/docs/9.1/queries-with.html


Yes CTEs are nice for sure ... But for fine-grained analysis, I still would use them to do any joins I don't want to do on the client, cut down the data size, then export to a "real language" for the details

IMO SQL and tidyverse in R are an ideal combo; this post compares Pandas and tidyverse:

https://www.oilshell.org/blog/2018/11/30.html

I agree that using Python for everything is suboptimal. I'm a big Python user, but for analytics I mostly skip Python, and use SQL and R




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: