This sort of thing is why looking at generated SQL while developing instead of just trusting the ORM to write good queries is so important.
I find query planning (and databases in general) to be very difficult to reason about, basically magic. Does anyone have some recommended reading or advice?
This website taught me a ton, even after I thought I knew more than enough about performance. Just seeing how different databases generate and execute their SQL is a huge boon (and sometimes extremely surprising when looking at one DBMS to another).
https://pages.cs.wisc.edu/~dbbook/ is a great overview, specifically chapters 13 and 14 on query optimization. it is difficult to reason about though, and every compiler is different. it takes time and enough examples to look at a query and have an intuition for what the plan should look like, and if there's something the compiler is not handling well.
If you are looking to squeeze every ounce of performance from your entire application stack, I'd say you should be looking at everything your ORM produces. The ORM is basically to speed up your developers time to production, but most ORMs will have some cases where they generate terrible SQL, and you can usually run your own SQL in a stored procedure if the generated SQL is sub-optimal. I've done this quite a few times with Microsoft's Entity Framework, but as new versions come out, it's become less common for me to have to do this. Usually I need to drop to a stored procedure for code that allows searching a large number of columns, in addition to sorting on all the columns that display. I also use stored procedures for multi-table joins with a WHERE clause, when using Entity Framework. You still need to look at your generated queries, but the code is nothing like it used to be under Entity Framework under the .NET Framework (at least in my experience - YMMV - you should never just let your ORM create SQL without reviewing what it is coming up with).
ORM often produces horrible queries that are impossible for humans to digest. I think there are two factors. First, queries are constructed incrementally and mechanically. There is no an overview for the generator to understand what developers want to compute, or no channel for developers to specify the intention. I anticipant this will change w/ AI in the near future. Second, ORM models data following the dogmatic data normalization, on which queries are destined to be horrible. I believe that people should take a moment to view their data, think what computations they want to do on top, estimate how expensive they may be, and finally settle on a reasonable model overall. Ask ORM (or maybe AI) to help with constructing and sending queries and assembling results. But do not delegate data modeling out. With right data modeling that fits computations, queries cann't be that bad.
Yes, I was going to say, seeing the generated SQL can be almost useless depending on the execution plan.
When you have a solid view of the schema and data sizes you can start to be more predictive about what your code will actually do, THEN you can layer on the complexity of the ORM hell code.
I find query planning (and databases in general) to be very difficult to reason about, basically magic. Does anyone have some recommended reading or advice?