This article speaks to me. So many times I have needed to go back and fix querie...

giovannibonetti · on Dec 22, 2022

Here is some actionable advice for speeding up a join with indexes. You'll need one in each table with the join columns. For example:

SELECT ... FROM table_A JOIN table_B ON table_A.column_A1 = table_B.column_B1 AND table_A.column_A2 = table_B.column_B2

You can add indexes like this: - table_A(column_A1, column_A2) - table_B(column_B1, column_B2)

If both tables are large enough, this query can probably take advantage of those indexes to perform a merge join.

Postgres tip: you can also add columns to the include part of the index to speed up filters in the WHERE conditions. You might even get an index-only scan! Look into covering indexes to learn more about it.

boredemployee · on Dec 22, 2022

Not related, but here in my job our query just stopped working because the data got too big. we use left joins everywhere because we don't want to lose the main table data. do you think the trick you just mentioned could optimize our left join as well?

Corazoor · on Dec 22, 2022

Most likely, yes. Combined indexes are real cool, because if the database can satisfy a join only from the index, it can usually perform it completely in memory, instead of having to do a table scan/hitting the disk for the values.

Another trick people usually shy away from: temporary tables. It might seem slow and wasteful to create a table (plus indices) just to store results for a fraction of a second, but for very large tables with large indices, creating a smaller index from the baseteable and joining against that can be magnitudes faster!

There is even dedicated syntax for that: CREATE TEMPORARY TABLE. They are local to the connection and will get dropped automatically at the end of the sql session.

They are also great for storing the results of (nondependent) subqueries, because for large sets, not every database is able to find the proper optimizations. Mysql versions < 8 for example.

I really recommend you to try that one. So far I could fix every "query takes too long" problem that resisted other solutions that way.

giovannibonetti · on Dec 22, 2022

I suppose so. I would just check if these tables are vacuumed often (preferably with autovacuum) otherwise the query planner might decide not to use the index depending on the visibility map conditions.

This post explains it in more detail: https://blog.pythian.com/postgres-covering-indexes-and-the-v...

rubyist5eva · on Dec 22, 2022

Agreed, partial indexes were a game changer for me. After that, Window Functions was like a whole new world of awesomeness opened to me. Nothing has been the same ever since I figured out how to effectively use Window Functions. One of the biggest Eureka! moments of my entire career.

giovannibonetti · on Dec 22, 2022

Actually I was talking about covering indexes:

CREATE INDEX ... INCLUDE ...

They can be used to speed up queries that have WHERE clauses, so I see it might have caused some confusion since partial indexes have WHERE clauses in the their definition.

rubyist5eva · on Dec 23, 2022

Interesting, I've never seen INCLUDE before - probably because we use a version of postgres that does not support it - but I see now that it can be useful instead of doing CREATE INDEX idx ON table (colA, colB) because of the lack of ordering in the b-tree leaves. Good stuff. Just another piece of ammunition for upgrading our db to a newer version that has all sorts of new goodies I've been reading about lately.

411111111111111 · on Dec 22, 2022

ok, now i cant resist

> This article speaks to me. So many times I have needed to go back and fix queries that were naively written this way like it was some kind of "optimization"

in some cases, doing joins in the application is more performant then making the database do it. Its usually better to do it by join, but depending on the data you're joining you might incur significant slowdowns. Its always better to start with the join and only evaluate the application join if there is a need to improve the performance however. Nonetheless, a sweeping statement like yours doesn't help either.

rubyist5eva · on Dec 22, 2022

Read my comment again, I said in the vast majority of cases, not "all".