I totally agree--thank you for bringing this up. I'm not sure it would be faster, but it is an interesting case for goto.
I saw another proposal referencing the original article which used a trio of functions in a nice way to achieve something similar without the goto (and without the ifs). My main point was to illustrate the use of custom iterators.
Consider me nerd-sniped. I typed the last post on my phone, but just had to bust out the laptop to check. Noticed a couple bugs I had: the for-loop initializers need to be set, and there needs to be an empty statement after the cont: label.
The goto version only has one unconditional branch outside of a loop, rather than three conditional branches inside loops. Anyways, the speed difference shouldn't be too big, as the branches are pretty predictable (pretty much always not taken).
Quick benchmark of the original and mine, up to the first 3000 triples:
$ c++ -O3 pyth1.cpp && time ./a.out | md5
33aa33d6cad59951489757e06aeb5a15
real 0m5.492s
user 0m5.484s
sys 0m0.007s
$ c++ -O3 pyth2.cpp && time ./a.out | md5
33aa33d6cad59951489757e06aeb5a15
real 0m4.248s
user 0m4.238s
sys 0m0.011s
I'm always looking for a legitimate excuse to use a goto in my C++ code because I'm perverse. It's very rare to find one but this looks like it might be okay.