Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Being able to go from duckdb to pandas very quickly to make operations that make more sense on the other end and come back while not having to change the format is super powerful

I can't stress enough how much I think this is truly transformative. It's generally nice as a working pattern, but much more importantly it lets the scale of a problem that a tool needs to solve shrink dramatically. Pandas doesn't need to do everything, nor does DuckDB, nor does some niche tool designed to perform very specific forecasting - any can be slotted into an in memory set of processes with no overhead. This lowers the barrier to entry for new tools, so they should be quicker and easier to write for people with detailed knowledge just in their area.

It extends beyond this too, as you can then also get free data serialisation. I can read data from a file with duckdb, make a remote gRPC call to a flight endpoint written in a few lines of python that performs whatever on arrow data, returns arrow data that gets fed into something else... in a very easy fashion. I'm sure there's bumps and leaky abstractions if you do deep work here but I've absolutely got remote querying of dataframes & files working with a few lines of code, calling DuckDB on my local machine through ngrok from a colab instance.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: