I'm saying that adding hashes is basically free, given the right hash function.
Sorting is very much not free though. I don't see why would sorting be needed if you have an efficient hash table, or a search tree. Essentially TFA describes just that, an index file + data file approach. They could have used SQLite which readily provides indexed access, instead of pure Python.
Sorting is very much not free though. I don't see why would sorting be needed if you have an efficient hash table, or a search tree. Essentially TFA describes just that, an index file + data file approach. They could have used SQLite which readily provides indexed access, instead of pure Python.
Of course if you never do the same search again, you're back to mmap-ing the file and doing a full scan smartly, like GNU grep does (https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...).