This isn't actually correct, is it? Assuming an average 6-letter word length, there are about 10^67 valid combinations that your iterate_all() can go through:
(((2^6) * 1000) ^ 12) * (94 ^ 5) = 3.5 * 10^67
Since this is much larger than the total hash space (2^160 = 10^48), you can hope you'll get a collision after the number of tries equal to half of the hashspace (10^48 / 2). That means you'll have to generate 10^47 hashes. At 5 minutes of runtime, you'll have to generate 10^45 hashes per second.
Bottom line: if you can run this on 200,000 cores, you'll have to generate 10^40 hashes per second per core, and that seems impossible to me.
Weeell we have 20,023 computers (that fluctuates a tad). That's probably about 30,000 cores in the end.
I cant iterate the whole lot - but I can cover a substantial portion more of the keyspace in 5 mins than most could in the 30hrs. I bet 5 minutes will easily get me damn close :)
I'm seeing about 1 million hashes per second on a single core. So, you can do about 10^11 hashes per second on 30,000 cores. At that rate, it will only take you 10^48 years to cover the keyspace.
You won't even cover a fraction of a percent of the keyspace before the sun burns out.
I think it unlikely you have a compute cluster that can generate 3.3 million, billion, billion, billion SHA1 hashes per second (edit: and that assumes not adding the random 5 characters to the end). The actual contest will have a word list of a thousand words.
Also, (a) you're allowed to permute case of letters, exploding the search space even more, which is handy because (b) I presume the target sha1 will come from words not on the word list, making an exact match unlikely (hence score based on hamming distance).
500 cores gives you no really higher chance of getting a good combination over someone with one PC.
Yes you get 500 times the attempts in the 30hr period. But if you run the figures it's something like 0.00000001% difference in the amount of keyspace you can cover :)
AKA just run it on a couple of PC's an hope you get lucky!
Apparently, I am not the only one who is into using the university cluster for my own personal projects ;-)
Be careful. This is not "exactly" legal, and it could hurt you. Make sure you also throw some legitimate computational work at the clusters so you have an alibi at least.
I'm being lazy about the search, depending on the random distribution to keep work from overlapping too much instead of taking the time to keep track of what has been done, or implementing a systematic search of the possible combinations.
My thought is that there are so many possibilities and so few CPU cycles available to me that making random guesses is a better way to approach it. Kinda like winning the lottory, except harder.
I've got mine in C. I probably won't run it, just wrote it for fun. Honestly, if you were thinking of spending a cent on this competition, I think you'd get a better ROI buying a SuperLOTTO ticket.
I'm getting about 250k sha1 hamming distance calculations per core per second, on an Intel(R) Xeon(R) CPU 3050 @ 2.13GHz (from /proc/cpuinfo). This is a full implementation... I can just drop in the word list and challenge phrase on game day.
Intel(R) Xeon(R) CPU 5148 @ 2.33GHz gives me 300k/sec.
I'm getting more than 1M SHA1 + Hamming Distance calcs/core/sec on a Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, and I haven't really optimized beyond the obvious.
It's a brutally simple algorithm, really. XOR + bitcount, which only iterates for each 1 bit, so the lower the total hamming #, the faster it finds it.
It's in C, so everything works in unsigned long blocks. Compiled with -O3 it's:
real 0m0.968s
for 1M each on two forked processes, and 80 lines of code.
It uses the multiprocessing package, which is new in Python 2.6. From the docs:
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.http://docs.python.org/library/multiprocessing.html
this is my code (using mighty, our python based cluster service):
Done. Takes about 5 minutes (I wont enter - it's not fair).EDIT: added CRITICAL as the priority or it takes longer :)
EDIT: ok so the code is a bit the wrong side of the tracks. It wouldnt take long to refactor it :)