Consider me nerd-sniped. I typed the last post on my phone, but just had to bust out the laptop to check. Noticed a couple bugs I had: the for-loop initializers need to be set, and there needs to be an empty statement after the cont: label.
The goto version only has one unconditional branch outside of a loop, rather than three conditional branches inside loops. Anyways, the speed difference shouldn't be too big, as the branches are pretty predictable (pretty much always not taken).
Quick benchmark of the original and mine, up to the first 3000 triples:
$ c++ -O3 pyth1.cpp && time ./a.out | md5
33aa33d6cad59951489757e06aeb5a15
real 0m5.492s
user 0m5.484s
sys 0m0.007s
$ c++ -O3 pyth2.cpp && time ./a.out | md5
33aa33d6cad59951489757e06aeb5a15
real 0m4.248s
user 0m4.238s
sys 0m0.011s
The goto version only has one unconditional branch outside of a loop, rather than three conditional branches inside loops. Anyways, the speed difference shouldn't be too big, as the branches are pretty predictable (pretty much always not taken).
Quick benchmark of the original and mine, up to the first 3000 triples: