Consider me nerd-sniped. I typed the last post on my phone, but just had to bust...

Consider me nerd-sniped. I typed the last post on my phone, but just had to bust out the laptop to check. Noticed a couple bugs I had: the for-loop initializers need to be set, and there needs to be an empty statement after the cont: label.

The goto version only has one unconditional branch outside of a loop, rather than three conditional branches inside loops. Anyways, the speed difference shouldn't be too big, as the branches are pretty predictable (pretty much always not taken).

Quick benchmark of the original and mine, up to the first 3000 triples:

  $ c++ -O3 pyth1.cpp  && time ./a.out | md5
  33aa33d6cad59951489757e06aeb5a15
  
  real    0m5.492s
  user    0m5.484s
  sys     0m0.007s
  $ c++ -O3 pyth2.cpp  && time ./a.out | md5
  33aa33d6cad59951489757e06aeb5a15
  
  real    0m4.248s
  user    0m4.238s
  sys     0m0.011s