Computing a SHA1 (Ed: for every password in the file) is hardly free, which suggests rather than minimize total time you want to minimize latency for a service.
In which case the obvious improvement is to load as much of the password file into RAM as possible while waiting for the request and then only search the disk if it’s non in RAM. As to whether you want to load prefixes so you can quickly reject passwords not in the file or full passwords to acknowledge the password as soon as possible probably depends on how likely the matches are.
Modern CPUs have hardware acceleration for computing SHA1 and SHA2. They can be effectively free if they can be computed by CPU cores faster than the data from disk (likely NVMe now) are committed to RAM.
But a crypto hash like SHA may be a wrong hash here. Something like murmur / city / metro hash may be a better solution, since they compute 3-5 bytes of hash per clock.
You are being down voted because although your point is valid, it has zero relevance because the goal is fast lookup. Preprocessing cost is not a factor.
I'm saying that adding hashes is basically free, given the right hash function.
Sorting is very much not free though. I don't see why would sorting be needed if you have an efficient hash table, or a search tree. Essentially TFA describes just that, an index file + data file approach. They could have used SQLite which readily provides indexed access, instead of pure Python.
In which case the obvious improvement is to load as much of the password file into RAM as possible while waiting for the request and then only search the disk if it’s non in RAM. As to whether you want to load prefixes so you can quickly reject passwords not in the file or full passwords to acknowledge the password as soon as possible probably depends on how likely the matches are.