Other ideas, assuming on modern intel (or similar cache size) architecture -- you can load multiple ints at a time and do a bitwise or on the bits higher your largest number.
Indeed, asking the question of "Why are we doing this" and "What problem does this solve" is a great way to start a discussion of any engineering solution, and a great way to avoid doing unnecessary work.
There are a lot of things that fundamentally deal with either systems that users interact with or that run on real machines with physical world implications that also make things perform differently, from user interaction latency and pre-computing and pre-caching, to SSD block level reads/writes, cpu caches and other caches, network latency implications, etc.
Similarly, on the algorithm front from things like Boyer–Moore to work being done on non-cryptographic hashes (ie, https://github.com/rurban/smhasher/ ) that is sometimes mind boggling.
There's also spagetti sort equivalent https://en.wikipedia.org/wiki/Parallel_random-access_machine
Also https://en.wikipedia.org/wiki/Grover%27s_algorithm
Original of this: https://www.quora.com/What-is-the-fastest-algorithm-to-find-...