The main point didn't get hit on by the responses. Re-ranking is just a mini-LLM (for latency/cost reasons) that does a double heck. Embedding model finds the closest M documents in R^N space. Re-ranker picks the top K documents from the M documents. In theory, if we just used Gemini 2.5 Pro or GPT 5 as the re-ranker, the performance would even be better than whatever small re-ranker people choose to use.